Claude Code deployment patterns and greatest practices with Amazon Bedrock

Claude Code is an AI-powered coding assistant from Anthropic that helps builders write, evaluate, and modify code via pure language interactions. Amazon Bedrock is a totally managed service that gives entry to basis fashions from main AI corporations via a single API. This publish reveals you find out how to deploy Claude Code with Amazon Bedrock. You’ll study authentication strategies, infrastructure selections, and monitoring methods to deploy securely at enterprise scale.

Suggestions for many enterprises

We advocate the Steering for Claude Code with Amazon Bedrock, which implements confirmed patterns that may be deployed in hours.

Deploy Claude Code with this confirmed stack:

This structure offers safe entry with consumer attribution, capability administration, and visibility into prices and developer productiveness.

Authentication strategies

Claude Code deployments start with authenticating to Amazon Bedrock. The authentication resolution impacts downstream safety, monitoring, operations, and developer expertise.

Authentication strategies comparability

Characteristic	API Keys	AWS log in	SSO with IAM Identification Middle	Direct IdP Integration
Session length	Indefinite	Configurable (as much as 12 hours)	Configurable (as much as 12 hours)	Configurable (as much as 12 hours)
Setup time	Minutes	Minutes	Hours	Hours
Safety danger	Excessive	Low	Low	Low
Person attribution	None	Fundamental	Fundamental	Full
MFA assist	No	Sure	Sure	Sure
OpenTelemetry integration	None	Restricted	Restricted	Full
Price allocation	None	Restricted	Restricted	Full
Operation overhead	Excessive	Medium	Medium	Low
Use case	Quick time period testing	Testing and restricted deployments	Fast SSO deployment	Manufacturing deployment

The next will focus on the trade-offs and implementation issues specified by the above desk.

API keys

Amazon Bedrock helps API keys because the quickest path to proof-of-concept. Each short-term (12-hour) and long-term (indefinite) keys could be generated via the AWS Administration Console, AWS CLI, or SDKs.

Nevertheless, API keys create safety vulnerabilities via persistent entry with out MFA, handbook distribution necessities, and danger of repository commits. They supply no consumer attribution for price allocation or monitoring. Use just for short-term testing (< 1 week, 12-hour expiration).

AWS log in

The aws login command makes use of your AWS Administration Console credentials for Amazon Bedrock entry via a browser-based authentication circulation. It helps fast setup with out API keys and is beneficial for testing and small deployments.

Single Signal-On (SSO)

AWS IAM Identification Middle integrates with current enterprise identification suppliers via OpenID Join (OIDC), an authentication protocol that allows single sign-on by permitting identification suppliers to confirm consumer identities and share authentication info with purposes. This integration permits builders to make use of company credentials to entry Amazon Bedrock with out distributing API keys.

Builders authenticate with AWS IAM Identification Middle utilizing the aws sso login command, which generates momentary credentials with configurable session durations. These credentials routinely refresh, lowering the operational overhead of credential administration whereas enhancing safety via momentary, time-limited entry.

aws sso login --profile=your-profile-name 
export CLAUDE_CODE_USE_BEDROCK=1 
export AWS_PROFILE=your-profile-name

Organizations utilizing IAM Identification Middle for AWS entry can lengthen this sample to Claude Code. Nevertheless, it limits detailed user-level monitoring by not exposing OIDC JWT tokens for OpenTelemetry attribute extraction.

This authentication methodology fits organizations that prioritize fast SSO deployment over detailed monitoring or preliminary rollouts the place complete metrics aren’t but required.

Direct idP integration

Direct OIDC federation along with your identification supplier (Okta, Azure AD, Auth0, or AWS Cognito Person Swimming pools) is beneficial for manufacturing Claude Code deployments. This strategy connects your enterprise identification supplier on to AWS IAM to generate momentary credentials with full consumer context for monitoring.

The course of credential supplier orchestrates the OAuth2 authentication with PKCE, a safety extension that helps stop authorization code interception. Builders authenticate of their browser, exchanging OIDC tokens for AWS momentary credentials.

A helper script makes use of AWS Safety Token Service (STS) AssumeRoleWithWebIdentity to imagine a task with credentials to InvokeModel and InvokeModelWithStreaming to make use of Amazon Bedrock. Direct IAM federation helps session durations as much as 12 hours and the JWT token stays accessible all through the session, enabling monitoring via OpenTelemetry to trace consumer attributes like e mail, division, and group.

The Steering for Claude Code with Amazon Bedrock implements each Cognito Identification Pool and Direct IAM federation patterns, however recommends Direct IAM for simplicity. The answer offers an interactive setup wizard that configures your OIDC supplier integration, deploys the required IAM infrastructure, and builds distribution packages for Home windows, macOS, and Linux.

Builders obtain set up packages that configure their AWS CLI profile to make use of the credential course of. Authentication happens via company credentials, with automated browser opening to refresh credentials. The credential course of handles token caching, credential refresh, and error restoration.

For organizations requiring detailed utilization monitoring, price attribution by developer, and complete audit trails, direct IdP integration via IAM federation offers the muse for superior monitoring capabilities mentioned later on this publish.

Organizational selections

Past authentication, architectural selections form how Claude Code integrates along with your AWS infrastructure. These selections have an effect on operational complexity, price administration, and enforcement of utilization insurance policies.

Public endpoints

Amazon Bedrock offers managed, public API endpoints in a number of AWS Areas with minimal operational overhead. AWS manages infrastructure, scaling, availability, and safety patching. Builders use commonplace AWS credentials via AWS CLI profiles or setting variables. Mixed with OpenTelemetry metrics from Direct IdP integration, you possibly can monitor utilization via public endpoints by particular person developer, division, or price heart and could be enforced on the AWS IAM degree. For instance, implementing per-developer charge limiting requires infrastructure that observes CloudWatch metrics or CloudTrail logs and takes automated motion. Organizations requiring rapid, request-level blocking based mostly on customized enterprise logic may have extra parts comparable to an LLM (Giant Language Mannequin) gateway sample. Public Amazon Bedrock endpoints are enough for many organizations as they supply a steadiness of simplicity, AWS managed reliability, price alerting, and applicable management mechanisms.

LLM gateway

An LLM gateway introduces an middleman utility layer between builders and Amazon Bedrock, routing requests via customized infrastructure. The Steering for Multi-Supplier Generative AI Gateway on AWS describes this sample, deploying a containerized proxy service with load balancing and centralized credential administration.

This structure is greatest for:

Multi-provider assist: Routing between Amazon Bedrock, OpenAI, and Azure OpenAI based mostly on availability, price, or functionality
Customized middleware: Proprietary immediate engineering, content material filtering, or immediate injection detection on the request degree
Request-level coverage enforcement: Fast blocking of requests exceeding customized enterprise logic past IAM capabilities

Gateways present unified APIs and real-time monitoring however add operational overhead: Amazon Elastic Container Service (Amazon ECS)/Amazon Elastic Kubernetes Service (Amazon EKS) infrastructure, Elastic Load Balancing (ELB) Software Load Balancers, Amazon ElastiCache, Amazon Relational Database Service (Amazon RDS) administration, elevated latency, and a brand new failure mode the place gateway points block Claude Code utilization. LLM gateways excel for purposes making programmatic calls to LLMs, offering centralized monitoring, per consumer visibility, and unified management entry suppliers.

For conventional API entry eventualities, organizations can deploy gateways to realize monitoring and attribution capabilities. The Claude Code steering answer already consists of monitoring and attribution capabilities via Direct IdP authentication, OpenTelemetry metrics, IAM insurance policies, and CloudWatch dashboards. Including an LLM gateway to the steering answer duplicates current performance. Contemplate gateways just for multi-provider assist, customized middleware, or request-level coverage enforcement past IAM.

Single account implementation

We advocate consolidating coding assistant inferences in a single devoted account, separate out of your growth and manufacturing workloads. This strategy offers 5 key advantages:

Simplified operations: Handle quotas and monitor utilization via unified dashboards as an alternative of monitoring throughout a number of accounts. Request quota will increase as soon as somewhat than per account.
Clear price visibility: AWS Price Explorer and Price and Utilization Studies present Claude Code costs straight with out complicated tagging. OpenTelemetry metrics allow division and team-level allocation.
Centralized safety: CloudTrail logs circulation to 1 location for monitoring and compliance. Deploy the monitoring stack as soon as to gather metrics from builders.
Manufacturing safety: Account-level isolation helps stop Claude Code utilization from exhausting quotas and throttling manufacturing purposes. Manufacturing site visitors spikes don’t have an effect on developer productiveness.
Implementation: Cross-account IAM configuration lets builders authenticate via identification suppliers that federate to restricted roles, granting solely mannequin invocation permissions with applicable guardrails.

This technique integrates with Direct IdP authentication and OpenTelemetry monitoring. Identification suppliers deal with authentication, the devoted account handles inference, and growth accounts deal with purposes.

Inference profiles

Amazon Bedrock inference profiles present price monitoring via useful resource tagging, however don’t scale to per-developer granularity. When you can create utility profiles for price allocation, managing profiles for 1000+ particular person builders turns into operationally burdensome. Inference profiles work greatest for organizations with 10-50 distinct groups requiring remoted price monitoring, or when utilizing cross-Area inference the place managed routing distributes requests throughout AWS Areas. They’re supreme for eventualities requiring fundamental price allocation somewhat than complete monitoring.

System-defined cross-Area inference profiles routinely route requests throughout a number of AWS Areas, distributing load for larger throughput and availability. While you invoke a cross-Area profile (e.g., us.anthropic.claude-sonnet-4), Amazon Bedrock selects an accessible Area to course of your request.

Software inference profiles are profiles you create explicitly in your account, sometimes wrapped round a system-defined profile or a particular mannequin in a Area. You may tag utility profiles with customized key-value pairs like group:data-science or challenge:fraud-detection that circulation to AWS Price and Utilization Studies for price allocation evaluation. To create an utility profile:

aws bedrock create-inference-profile 
   --inference-profile-name team-data-science 
   --model-source arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-sonnet-4 
   --tags group=data-science costcenter=engineering

Tags seem in AWS Price and Utilization Studies, so you possibly can question:

"What did the data-science group spend on Amazon Bedrock final month?"

Every profile have to be referenced explicitly in API calls, that means builders’ credential configurations should specify their distinctive profile somewhat than a shared endpoint.

For extra on inference profiles, see Amazon Bedrock Inference Profiles documentation.

Monitoring

An efficient monitoring technique transforms Claude Code from a productiveness software right into a measurable funding by monitoring utilization, prices, and influence.

Progressive enhancement path

Monitoring layers are complementary. Organizations sometimes begin with fundamental visibility and add capabilities as ROI necessities justify extra infrastructure.

Let’s discover every degree and when it is smart on your deployment.

Notice: Infrastructure prices develop progressively—every degree retains the earlier layers whereas including new parts.

CloudWatch

Amazon Bedrock publishes metrics to Amazon CloudWatch routinely, monitoring invocation counts, throttling errors, and latency. CloudWatch graphs present mixture developments comparable to whole requests, common latency, and quota utilization with minimal deployment effort. This baseline monitoring is included in the usual pricing of CloudWatch and requires minimal deployment effort. You may create CloudWatch alarms that notify you when invocation charges spike, error charges exceed thresholds, or latency degrades.

Invocation logging

Amazon Bedrock invocation logging captures detailed details about every API name to Amazon S3 or CloudWatch Logs, preserving particular person request data together with invocation metadata and full request/response information. Course of logs with Amazon Athena, load into information warehouses, or analyze with customized instruments. The logs show utilization patterns, invocations by mannequin, peak utilization, and an audit path of Amazon Bedrock entry.

OpenTelemetry

Claude Code consists of assist for OpenTelemetry, an open supply observability framework for amassing utility telemetry information. When configured with an OpenTelemetry collector endpoint, Claude Code emits detailed metrics about its operations for each Amazon Bedrock API calls and higher-level growth actions.

The telemetry captures detailed code-level metrics not included in Amazon Bedrock’s default logging, comparable to: traces of code added/deleted, information modified, programming languages used, and builders’ acceptance charges of Claude’s options. It additionally tracks key operations together with file edits, code searches, documentation requests, and refactoring duties.

The steering answer deploys OpenTelemetry infrastructure on Amazon ECS Fargate. An Software Load Balancer receives telemetry over HTTP(S) and forwards metrics to an OpenTelemetry Collector. The collector exports information to Amazon CloudWatch and Amazon S3.

Dashboard

The steering answer features a CloudWatch dashboard that shows key metrics constantly, monitoring lively customers by hour, day, or week to disclose adoption and utilization developments that allow per-user price calculation. Token consumption breaks down by enter, output, and cached tokens, with excessive cache hit charges indicating environment friendly context reuse and per-user views figuring out heavy customers. Code exercise metrics monitor traces added and deleted, correlating with token utilization to indicate effectivity and utilization patterns.

The operations breakdown reveals distribution of file edits, code searches, and documentation requests, whereas consumer leaderboards show prime customers by tokens, traces of code, or session length.

The dashboard updates in near-real-time and integrates with CloudWatch alarms to set off notifications when metrics exceed thresholds. The steering answer deploys via CloudFormation with customized Lambda capabilities for complicated aggregations.

Analytics

Whereas dashboards excel at real-time monitoring, long-term developments and complicated consumer habits evaluation require analytical instruments. The steering answer’s non-obligatory analytics stack streams metrics to Amazon S3 utilizing Amazon Knowledge Firehose. AWS Glue Knowledge Catalog defines the schema, making information queryable via Amazon Athena.

The analytics layer helps queries comparable to month-to-month token consumption by division, code acceptance charges by programming language, and token effectivity variations throughout groups. Price evaluation turns into refined by becoming a member of token metrics with Amazon Bedrock pricing to calculate actual prices by consumer, then mixture for department-level chargeback. Time-series evaluation reveals how prices scale with group development for price range forecasting. The SQL interface integrates with enterprise intelligence instruments, enabling exports to spreadsheets, machine studying fashions, or challenge administration programs.

For instance, to see the month-to-month price evaluation by division:

SELECT division, SUM(input_tokens) * 0.003 / 1000 as input_cost, 
SUM(output_tokens) * 0.015 / 1000 as output_cost, 
COUNT(DISTINCT user_email) as active_users 
FROM claude_code_metrics 
WHERE yr = 2024 AND month = 1 
GROUP BY division 
ORDER BY (input_cost + output_cost) DESC;

The infrastructure provides average price: Knowledge Firehose costs for ingestion, S3 for retention, and Athena costs per question based mostly on information scanned.

Allow analytics while you want historic evaluation, complicated queries, or integration with enterprise intelligence instruments. Whereas the dashboard alone might suffice for small deployments or organizations targeted totally on real-time monitoring, enterprises making important investments in Claude Code ought to implement the analytics layer. This offers the visibility wanted to exhibit return on funding and optimize utilization over time.

Quotas

Quotas enable organizations to regulate and handle token consumption by setting utilization limits for particular person builders or groups. Earlier than implementing quotas, we advocate first enabling monitoring to know pure utilization patterns. Utilization information sometimes reveals that top token consumption correlates with excessive productiveness, indicating that heavy customers ship proportional worth.

The quota system shops limits in DynamoDB with entries like:

{ "userId": "jane@instance.com", "monthlyLimit": 1000000, "currentUsage": 750000, "resetDate": "2025-02-01" }

A Lambda perform triggered by CloudWatch Occasions aggregates token consumption each quarter-hour, updating DynamoDB and publishing to SNS when thresholds are crossed.

Monitoring comparability

The next desk summarizes the trade-offs throughout monitoring approaches:

Functionality	CloudWatch	Invocation logging	OpenTelemetry	Dashboard and Analytics
Arrange complexity	None	Low	Medium	Medium
Person attribution	None	IAM Identification	Full	Full
Actual-time metrics	Sure	No	Sure	Sure
Code-level metrics	No	No	Sure	Sure
Historic evaluation	Restricted	Sure	Sure	Sure
Price allocation	Account degree	Account degree	Person, group, division	Person, group, division
Token monitor	Mixture	Per-request	Per-user	Per-user with developments
Quota enforcement	Guide	Guide	Attainable	Attainable
Operational overhead	Minimal	Low	Medium	Medium
Price	Minimal	Low	Medium	Medium
Use case	POC	Fundamental auditing	Manufacturing	Enterprise with ROI

Placing it collectively

This part synthesizes authentication strategies, organizational structure, and monitoring methods right into a beneficial deployment sample, offering steering on implementation priorities as your deployment matures. This structure balances safety, operational simplicity, and complete visibility. Builders authenticate as soon as per day with company credentials, directors see real-time utilization in dashboards, and safety groups have CloudTrail audit logs and complete user-attributed metrics via OpenTelemetry.

Implementation path

The steering answer helps fast deployment via an interactive setup course of, with authentication and monitoring operating inside hours. Deploy the total stack to a pilot group first, collect actual utilization information, then broaden based mostly on validated patterns.

Deployment – Clone the Steering for Claude Code with Amazon Bedrock repository and run the interactive poetry run ccwb init wizard. The wizard configures your identification supplier, federation kind, AWS Areas, and non-obligatory monitoring. Deploy the CloudFormation stacks (sometimes 15-Half-hour), construct distribution packages, and check authentication regionally earlier than distributing to customers.

Distribution – Determine a pilot group of 5-20 builders from completely different groups. This group will validate authentication, monitoring, and supply utilization information for full rollout planning. Should you enabled monitoring, the CloudWatch dashboard reveals exercise instantly. You may monitor token consumption, code acceptance charges, and operation varieties to estimate capability necessities, determine coaching wants, and exhibit worth for a broader rollout.

Enlargement – As soon as Claude Code is validated, broaden adoption by group or division. Add the analytics stack (sometimes 1-2 hours) for historic pattern evaluation to see adoption charges, high-performing groups, and prices forecasts.

Optimization – Use monitoring information for steady enchancment via common evaluate cycles with growth management. The monitoring information can exhibit worth, determine coaching wants, and information capability changes.

When to deviate from the beneficial sample

Whereas the structure above fits most enterprise deployments, particular circumstances would possibly justify completely different approaches.

Contemplate an LLM gateway in the event you want a number of LLM suppliers past Amazon Bedrock, customized middleware for immediate processing or response filtering, or function in a regulatory setting requiring request-level coverage enforcement past the AWS IAM capabilities.
Contemplate inference profiles when you’ve got below 50 groups requiring separate price monitoring and like AWS-native billing allocation over telemetry metrics. Inference profiles work nicely for project-based price allocation however don’t scale to per-developer monitoring.
Contemplate beginning with out monitoring for time-limited pilots with below 10 builders the place fundamental CloudWatch metrics suffice. Plan so as to add monitoring earlier than scaling, as retrofitting requires redistributing packages to builders.
Contemplate API keys just for time-boxed testing (below one week) the place safety dangers are acceptable.

Conclusion

Deploying Claude Code with Amazon Bedrock at enterprise scale requires considerate authentication, structure, and monitoring selections. Manufacturing-ready deployments observe a transparent sample: Direct IdP integration offers safe, user-attributed entry and a devoted AWS account simplifies capability administration. OpenTelemetry monitoring offers visibility into prices and developer productiveness. The Steering for Claude Code with Amazon Bedrock implements these patterns in a deployable answer. Begin with authentication and fundamental monitoring, then progressively add options as you scale.

As AI-powered growth instruments develop into the trade commonplace, organizations that prioritize safety, monitoring, and operational excellence of their deployments will acquire lasting benefits. This information offers a complete framework that can assist you maximize Claude Code’s potential throughout your enterprise.

To get began, go to the Steering for Claude Code with Amazon Bedrock repository.

In regards to the authors

Courtroom Schuett is a Principal Specialist Answer Architect – GenAI who spends his days working with AI Coding Assistants to assist others get probably the most out of them. Exterior of labor, Courtroom enjoys touring, listening to music, and woodworking.

Jawhny Cooke is the World Tech Lead for Anthropic’s Claude Code at AWS, the place he focuses on serving to enterprises operationalize agentic coding at scale. He companions with prospects and companions to resolve the complicated manufacturing challenges of AI-assisted growth, from designing autonomous coding workflows and orchestrating multi-agent programs to operational optimization on AWS infrastructure. His work bridges cutting-edge AI capabilities with enterprise-grade reliability to assist organizations confidently undertake Claude Code in manufacturing environments.

Karan Lakhwani is a Sr. Buyer Options Supervisor at Amazon Net Providers. He focuses on generative AI applied sciences and is an AWS Golden Jacket recipient. Exterior of labor, Karan enjoys discovering new eating places and snowboarding.

Gabe Levy is an Affiliate Supply Marketing consultant at AWS based mostly out of New York primarily targeted on Software Improvement within the cloud. Gabe has a sub-specialization in Synthetic Intelligence and Machine Studying. When not working with AWS prospects, he enjoys exercising, studying and spending time with household and mates.

Gabriel Velazquez Lopez is a GenAI Product Chief at AWS, the place he leads the technique, go-to-market, and product launches for Claude on AWS in partnership with Anthropic.