A yr in the past, Simon Willison wrote one of many cleanest definitions of an agent that has caught round:
An LLM agent runs instruments in a loop to attain a purpose.
That definition caught as a result of it describes what each manufacturing agent really does. Kiro, Amazon Q Developer, Fast Brokers, Codex, Claude Code: beneath the hood, all of them run the identical form. The agent loop is the widespread denominator.
However the loop was by no means the arduous half. The arduous half was all the pieces round it.
Choose a framework. Wire up instruments. Provision sandboxed compute. Configure storage, secrets and techniques, networking. Determine the place reminiscence lives. Bolt on observability. Get the best dependencies into the best container. Additionally, native prototyping tends to be the simple half: a single developer can rise up an agent on their laptop computer in a day. Getting it into manufacturing is the place the work explodes, and the second it has to serve a couple of person, an entire new layer of labor reveals up: concurrency, isolation, identification, state, scaling.
Worse, that overhead multiplied with each new use case. Groups that needed to experiment, attempt a special mannequin, swap a software, level the agent at a brand new area, discovered themselves repeating the identical plumbing. The bottleneck wasn’t intelligence. It was orchestration and infrastructure.
After we launched the AgentCore harness in preview in April, we made a guess: the AgentCore primitives (Runtime, Reminiscence, Gateway, Browser, Identification, Observability) already give groups all the pieces they should run brokers in manufacturing; what they shouldn’t must do is wire them up by hand each time. The harness handles that wiring as a managed abstraction, so it turns into one thing you configure somewhat than one thing you construct.
Right this moment, Amazon Bedrock AgentCore harness is mostly obtainable. Two API calls (CreateHarness to outline an agent, InvokeHarness to run it), a fast walkthrough within the AgentCore CLI (as proven within the under gif), or just a few clicks within the console, and you’ve got an agent working in minutes. It runs in its personal remoted surroundings with a filesystem and shell, so it may possibly learn information, run instructions, and write code safely. It remembers customers and conversations throughout classes, picks up abilities you level it at (together with the AWS-curated catalog), browses the net, calls your instruments via gateway or MCP, and switches mannequin suppliers mid-session with out dropping context. Each step streams again to you in actual time and is robotically traced to CloudWatch. There’s no want to jot down orchestration code or construct a container, besides if you wish to.
What the harness provides you
A harness is all the pieces an agent must run in manufacturing, wrapped behind two API calls. You level to the mannequin, instruments, abilities, and directions you need. AgentCore handles the sandboxed surroundings, the reminiscence, the storage, the identification, and the observability that ties all of it collectively. Capabilities new at GA are marked with * within the diagram under.

Any mannequin: Use the best mannequin for the job, swap when it is advisable
Completely different duties want totally different fashions. Prospects informed us they need to plan with one mannequin and execute with one other, swap a supplier for a price-performance take a look at, or transfer off a mannequin that simply shipped a regression, all with out dropping the dialog. Choose a default mannequin on CreateHarness, then override it on any single InvokeHarness name when it is advisable. The default stays in place for each different invocation. Set the matching subject on mannequin for the supplier you need:
bedrockfor any mannequin served on Amazon Bedrock, together with Anthropic Claude, Amazon Nova, Meta Llama, DeepSeek, Qwen, Kimi, MiniMax, Cohere, Mistral and as of not too long ago OpenAI GPT-5.5 and GPT-5.4 on BedrockopenAifor direct entry to OpenAI’s API (api.openai.com)geminifor Google GeminiliteLlmfor any third-party supplier supported by LiteLLM, together with Anthropic direct, Cohere, Mistral, Vertex, Azure OpenAI, and others
And the half that prospects informed us mattered most: swap suppliers at any level, even mid-session, and maintain context. For instance, you should utilize Claude Opus to plan, swap to GPT-5.5 to jot down code, swap to Gemini to summarize. The dialog continues. The harness handles the transition seamlessly.

For those who’re utilizing API keys to entry any of the underlying mannequin suppliers, they’re saved securely in AgentCore Identification’s token vault. The agent by no means sees uncooked credentials.
Instruments as config: Join your agent to the world with out writing glue code
Instruments are how the agent impacts something outdoors its personal reasoning, and wiring them is the half most groups quietly hate. Prospects informed us they don’t need to write per-API adapter code, handle MCP server lifecycles, or construct their very own browser sandbox. They need to declare what the agent can use and let the harness deal with the connection, the auth, and the execution.
instruments on CreateHarness are a listing. Every entry has a kind and a config block, and the harness wires them in:
agentcore_gateway: you may reference an AgentCore Gateway by ARN. Each goal the gateway exposes (OpenAPI, Smithy, Lambda, MCP) reveals up as a software, with IAM/JWT auth, per-tool authorization, and outbound credential brokering dealt with for you.remote_mcp: you may join on to any MCP server by URL. Good when the server is already secured and also you don’t want Gateway’s governance layer in entrance of it.agentcore_browser: a full browser sandbox as a one-line reference. Click on, sort, navigate, screenshot.agentcore_code_interpreter: sandboxed Python and Node execution, identical one-line sample.inline_function: a software schema the harness emits as a tool-use occasion within the stream and waits so that you can reply on. Use it for human-in-the-loop approvals or for instruments that must run in your aspect.
Each session additionally will get built-in shell (run instructions contained in the microVM) and file_operations (learn and write on the agent’s filesystem) with out you itemizing them. They’re what make the stateful filesystem and shell story usable from the mannequin.
You’ve gotten the identical choices on InvokeHarness for per-call edits, the place you may cross new instruments to alter instruments for a single name, or strip the checklist right down to a targeted set for that invocation by way of the allowed_tools parameter. Defaults are set at create time, however you may simply override at invoke time.
Constructed-in reminiscence: Your harness remembers customers and conversations
Prospects need their agent to acknowledge a returning person, decide up the place the final dialog left off, and keep in mind preferences with out anybody replaying message historical past. In preview, you needed to provision an AgentCore Reminiscence useful resource individually and cross its ARN, which labored however was a second API name and a straightforward factor to overlook on the best way to manufacturing.
At GA, omitting reminiscence on CreateHarness provisions a managed reminiscence robotically, with wise defaults: SEMANTIC + SUMMARIZATION methods, 30-day occasion expiry, AWS-owned encryption, and multi-tenant isolation by default via namespace templates that key on actorId. It’s an actual, customer-owned Reminiscence useful resource, provisioned for you. Reminiscence isn’t obligatory. In case your agent is stateless, set reminiscence: { disabled: {} } and the harness skips reminiscence completely. For those who’d somewhat connect an AgentCore Reminiscence useful resource you already personal, cross agentCoreMemoryConfiguration with its ARN. These three paths appear to be the next:
Switching to your individual reminiscence is one UpdateHarness name. Move agentCoreMemoryConfiguration along with your reminiscence ARN and the beforehand managed reminiscence disassociates instantly. It’s nonetheless an everyday AgentCore Reminiscence useful resource in your account, so you may maintain utilizing it wherever, connect it to a different harness, question it instantly, or delete it by yourself phrases. If you delete the harness, the managed reminiscence is cascade-deleted by default (deleteManagedMemory: true). Move deleteManagedMemory: false if you wish to maintain it.
The managed reminiscence is computerized however not opaque. It’s an actual, addressable AWS useful resource you may question, connect to a special agent, audit, or hand to an analytics pipeline.
Abilities: Give your agent the best experience on the best job
Prospects need their agent to know the best way to deal with a particular job earlier than it tries it. For instance, the best way to format an Excel report, the best way to file a JIRA ticket the best way their staff information them, or the best way to observe AWS-recommended procedures for accessing their knowledge on AWS. Abilities are the way you give the agent that information on demand. They’re bundles of information, scripts, and directions. The harness hundreds ability metadata and pulls full content material into context solely when the duty really requires it.
At GA, HarnessSkill is a union with 4 sources, so you may connect abilities declaratively with out baking them right into a container or shelling in:
awsSkills– activate the AWS-curated ability bundle.git– clone a public or personal repo over HTTPS, pinned to a commit or a department.s3– pull a ability bundle from your individual Amazon Easy Storage Service (Amazon S3) bucket.path– reference a path that already exists within the container you introduced in.
The identical form works on InvokeHarness for per-call layering. The harness materializes every ability onto the session filesystem on session begin, or throughout a brand new invocation if the Abilities configuration adjustments.
The massive unlock for AWS builders: the AWS abilities repository ships curated abilities masking the AWS floor space, from core abilities (SDK utilization, infrastructure as code (IaC), AWS Identification and Entry Administration (IAM), Amazon CloudWatch, and Amazon Bedrock) to service-specific deep workflows for analytics, databases, Amazon Elastic Compute Cloud (Amazon EC2), networking, safety, serverless, and storage.
To make this even less complicated, GA introduces a first-class awsSkills toggle: activate the AWS ability bundle with zero plumbing, no URL, no community fetch (the talents are introduced within the harness’s underlying runtime, everytime you want them).
Setting and filesystem: Run your agent within the surroundings it wants
Most brokers run fantastic on the harness’s default surroundings, which incorporates Python and bash. If you want extra (a non-public dependency, a runtime model, a CLI software, or persistence throughout classes), two knobs allow you to form the agent’s runtime to match your stack: the container picture and the filesystem.
Container picture. If Python and bash aren’t sufficient, you may bundle your supply code, dependencies, runtimes, and instruments right into a customized container, push it to Amazon Elastic Container Registry (Amazon ECR), and reference it in CreateHarness. The agent then makes use of that actual surroundings. You may also pair it with InvokeAgentRuntimeCommand, an API that runs a shell command instantly contained in the agent’s microVM session, for session-specific setup that varies per invocation (clone a selected department, seed take a look at knowledge, or pull credentials). It’s deterministic, doesn’t undergo the mannequin, and doesn’t burn tokens.
Filesystem. Brokers typically want information to survive a single response: a shared information base, a working listing throughout classes, or a spot to drop produced paperwork again into your bucket. The harness provides you three filesystem choices, every with totally different attain and persistence traits.
| Kind | Managed | Digital personal cloud (VPC) required | Persistence |
| Managed session storage | Sure | No | Throughout cease/resume cycles of the identical runtimeSessionId. |
| Amazon Elastic File System (Amazon EFS) entry level | BYO | Sure | Throughout all classes, sharable throughout harnesses. |
| Amazon Easy Storage Service (Amazon S3) Recordsdata entry level | BYO | Sure | Throughout all classes and harnesses, with full Amazon S3 sturdiness, versioning, and historical past. |
Attain for managed session storage for working information that have to survive microVM restarts inside a session. Attain for EFS when a number of harnesses or classes have to share reference knowledge, prompts, or ability bundles. Attain for S3 Recordsdata if you need the agent to learn and write via commonplace file operations whereas adjustments are robotically synchronized with the backing S3 bucket (the agent writes a report, the report seems in your S3 bucket because it goes).
Unified observability: See what your agent did, in a single place
When one thing goes incorrect, prospects need to know in a single place what the agent ran, what it known as, the place it slowed down, and the place it failed. A typical harness invocation crosses runtime + reminiscence + gateway + a built-in software or two, and stitching that image collectively used to imply opening 5 tabs.
At GA, each harness web page within the AgentCore console reveals a single observability widget: an combination row that summarizes the harness throughout each primitive it touched, plus per-primitive sections that seem just for the primitives the harness is configured with or has used.

For deeper evaluation, CloudWatch GenAI Observability has a brand new Harnesses tab alongside Runtime and different primitives. Drill from a harness, right into a session, right into a single hint, and see precisely what the agent did, in what order, how lengthy every step took, and the place it failed. Logs from each primitive (reminiscence, gateway, browser, code interpreter) floor inline on the proper span, so that you cease hopping between log teams to piece collectively what occurred.

Consider and optimize: Preserve bettering your agent in manufacturing
As soon as your agent is in manufacturing, the query shifts from “does it work?” to “is it bettering?” Prospects need a approach to rating how their agent is definitely doing on actual site visitors, get strategies on what to alter, and validate these adjustments earlier than rolling them out. GA brings two items that shut that loop:
- AgentCore Evaluations rating harness traces with built-in massive language mannequin (LLM)-as-a-judge evaluators (helpfulness, faithfulness, security), or with customized evaluators you creator. Run them on-line (scoring each session because it occurs), on-demand for a single hint, in batch over historic traces, in opposition to a set take a look at dataset, or as a simulation with artificial customers to stress-test earlier than going stay.
- AgentCore optimization reads these evaluator scores and generates immediate and tool-description suggestions, then validates them by routing stay site visitors between two variants via AgentCore Gateway with on-line analysis scoring per session and statistical significance reporting. Variants might be totally different variations of an non-compulsory configuration bundle on the identical runtime, or totally different model pointing at totally different endpoints, so you may A/B-test immediate and tool-description adjustments with out redeploying code by pointing simply at a special endpoint.
Run your harness, seize traces, get scores, get suggestions, A/B-test the advisable configuration in opposition to the present one, then ship the winner.
Model and roll again: Roll out adjustments safely, roll again immediately
Prospects need to replace prompts, swap a software, or attempt a brand new mannequin on a subset of site visitors with out placing the entire agent in danger. Versioning and endpoints on the harness mirror what AgentCore Runtime already provides: each UpdateHarness creates an immutable model capturing the complete configuration (mannequin, system immediate, instruments, reminiscence config, abilities, surroundings, truncation, execution limits), and rollback is “level the endpoint at an earlier model.”
The DEFAULT endpoint auto-advances on each replace. Named endpoints (PROD, STAGING) keep pinned till you explicitly promote.
Export to code: Graduate when configuration isn’t sufficient
When a use case outgrows configuration (customized orchestration, multi-agent coordination, deep instrumentation), prospects need to take the agent additional with out rebuilding it from scratch. One CLI command exports the harness as Strands-based code that may host on AgentCore Runtime or wherever else:
The exported venture preserves your mannequin, immediate, instruments, reminiscence wiring, abilities, and container surroundings. Identical compute path, identical observability, identical identification primitives. The commencement is a config-to-code translation, not an structure swap.
Strands is the primary export goal; Claude Agent SDK is coming quickly, so prospects preferring that framework can graduate the identical approach.
That is the a part of the harness story we care about most. When configuration stops being sufficient, you graduate to the identical compute and the identical primitives, with code you may learn and modify, as an alternative of beginning over from scratch.
Different notable additions
We additionally added the next:
Step Features integration. A harness invocation is now a first-class state in AWS Step Features. In Workflow Studio, seek for AgentCore InvokeHarness and drag it into your workflow. Use Fast Create Harness to scaffold a brand new harness and execution position from inside Step Features, or level at an current harness and override per name. The identical InvokeHarness semantics apply, with defaults on the harness and overrides on the Activity state.
Internet Search on AgentCore. The brand new Internet Search on AgentCore (additionally launched at NY Summit) is on the market to harness brokers via AgentCore Gateway: expose Internet Search as a Gateway goal, reference the Gateway from the harness, and the agent has search. A primary-party agentcore_web_search software sort is coming quickly, matching the one-line sample of agentcore_browser and agentcore_code_interpreter.
What you are able to do with all of this
There are numerous use circumstances the harness can assist, throughout industries and agent varieties. To offer you a way of the variety, listed below are three concrete examples, every one thing groups informed us they had been piecing collectively by hand earlier than.
A analysis and writing agent. The agent might search the net, browse sources, draft a doc, and hand you again an actual xlsx or pptx file, with reminiscence carrying throughout classes so the following query doesn’t replay all the pieces. The minimal to face it up is one CreateHarness name:
instruments:agentcore_browser, plus a Gateway goal that exposes Internet Search on AgentCore.abilities: agitsupply pointing atanthropics/abilitiesfor the document-skills bundle.
Reminiscence is on by default, so that you don’t configure it explicitly. That’s it.
An AWS knowledge and analytics agent to your staff. The agent might pull knowledge out of your AWS account (Amazon Athena, AWS Glue, Amazon S3, Amazon Redshift, Amazon CloudWatch), run an evaluation, and hand again a abstract, a chart, or a discovering, whereas following AWS-recommended procedures for accessing every service step-by-step as an alternative of improvising. The minimal to face it up is one CreateHarness name:
abilities:[{"awsSkills": {}}]to flip on the curated AWS catalog (analytics, database, Amazon EC2, networking, safety, serverless, and storage).executionRoleArn: an IAM position scoped to no matter AWS APIs you need the agent to learn from.
Add agentcore_code_interpreter if you would like the agent to additionally run Python in a sandbox to slice and visualize the information it pulls.
A coding agent. The agent might learn your code base, plan a change, write it, run the assessments, and open a pull request (PR), with the flexibility to change to a special mannequin mid-session for design and implementation with out dropping context. The minimal to face it up is 2 steps:
- Push a customized container along with your repo and toolchain to Amazon ECR.
- Name
CreateHarnesswithenvironmentArtifactpointing at that picture, plus a Gateway goal wired to GitHub (or your inner GitLab or Bitbucket equal) so the agent can work together with branches, PRs, and evaluations.
For deterministic git operations like clone, commit, push, and open a PR (with out paying the mannequin to suppose via them), name InvokeAgentRuntimeCommand instantly.
These are three totally different brokers, with the very same harness. The API configuration is the one factor that adjustments.
Pay just for what you utilize
There isn’t a further harness payment. You pay for the underlying capabilities primarily based on precise consumption.
- Runtime compute (the place the harness session runs): active-consumption pricing per second, $0.0895 per vCPU-hour, $0.00945 per GB-hour. Agentic workloads spend important time ready on mannequin and gear I/O. Runtime payments solely when CPU is definitely consumed.
- Browser and Code Interpreter: identical active-consumption mannequin.
- Gateway: per-1,000 invocations and per-1,000 search queries.
- Reminiscence: per-1,000 short-term occasions, per-1,000 long-term data per 30 days, per-1,000 retrievals.
- Observability: commonplace Amazon CloudWatch pricing for spans, logs, and metrics.
- Mannequin inference: charged by Amazon Bedrock or the third-party supplier at their commonplace charges.
Every is impartial. Use one, use all. An agent that runs for 60 seconds and calls two instruments prices accordingly. An agent that runs for an hour with heavy compute prices accordingly. You pay proportionally to what your agent really computes.
For full pricing particulars, see the AgentCore pricing web page.
What a few of our prospects are enthusiastic about with harness
Omar Paul, VP of Product at Twilio said that “Twilio’s prospects are constructing AI brokers that work throughout voice, messaging, and digital channels — with real-time intelligence and protracted reminiscence that make each interplay really feel like a dialog. By combining AgentCore harness with Twilio Conversations, builders can go from thought to stay agent with out rewiring infrastructure. The most effective buyer experiences occur when nice AI and nice communications infrastructure are constructed collectively.”
Dr. Lukas Schack, Principal Machine Studying Engineer at TUI GROUP informed us that “Amazon Bedrock AgentCore has develop into a core constructing block at TUI: we use Runtime to host brokers throughout frameworks and Reminiscence to share context between them, in manufacturing and in workshops with over 500 staff, typically with greater than 130 individuals constructing on the identical time. With AgentCore harness what used to take weeks from thought to working product now takes minutes, and customer-facing use circumstances are subsequent.”
Rodrigo Moreira, VP of Engineering, VTEX mentioned “We’re constructing AI brokers that can revolutionize ecommerce. Beforehand, prototyping every new agent required days of orchestration code and infrastructure setup earlier than we might validate an thought. AgentCore harness has modified that: swapping a mannequin, including a software, changing a ability, or refining an agent’s directions is now a configuration change, not a rebuild. We are able to now validate agent concepts in minutes as an alternative of days, and we’re wanting ahead to accelerating agent growth additional with these new capabilities”.
Kazumi Matsuda, Senior Supervisor, AI Promotion Division at FUJISOFT famous that “At Fujisoft, we’re constructing AI brokers to speed up software program growth and operations throughout our groups. Our framework, Character Capsule, packages agent roles, abilities, and execution procedures as reusable capsules that scale to multi-agent orchestration on AgentCore. With AgentCore harness, we deploy new brokers in minutes and model every change. As soon as in manufacturing, evaluations scores how our brokers carry out utilizing execution logs, and AgentCore’s optimization capabilities generate immediate and gear strategies primarily based on these scores. We A/B take a look at these suggestions on stay site visitors earlier than rolling out, so enchancment is steady, not guesswork. Collectively, these capabilities allow us to rise up new brokers rapidly and maintain bettering them with confidence, catching high quality regressions earlier than they attain manufacturing and rolling out solely the adjustments we’ve validated throughout our multi-agent patterns.”
Get began
Amazon Bedrock AgentCore harness is on the market right now in all AWS Areas the place AgentCore is mostly obtainable.
The sooner a staff can get from thought to working agent, the extra concepts they will afford to check. The harness collapses that loop from days to minutes. We’re excited to see what you construct.
Extra sources
For extra info, see the next:
In regards to the authors
