Wednesday, March 11, 2026

Speed up customized LLM deployment: Wonderful-tune with Oumi and deploy to Amazon Bedrock


This put up is cowritten by David Stewart and Matthew Individuals from Oumi.

Wonderful-tuning open supply massive language fashions (LLMs) typically stalls between experimentation and manufacturing. Coaching configurations, artifact administration, and scalable deployment every require totally different instruments, creating friction when transferring from speedy experimentation to safe, enterprise-grade environments.

On this put up, we present how one can fine-tune a Llama mannequin utilizing Oumi on Amazon EC2 (with the choice to create artificial knowledge utilizing Oumi), retailer artifacts in Amazon S3, and deploy to Amazon Bedrock utilizing Customized Mannequin Import for managed inference. Whereas we use EC2 on this walkthrough, fine-tuning could be accomplished on different compute companies corresponding to Amazon SageMaker or Amazon Elastic Kubernetes Service, relying in your wants.

Advantages of Oumi and Amazon Bedrock

Oumi is an open supply system that streamlines the inspiration mannequin lifecycle, from knowledge preparation and coaching to analysis. As a substitute of assembling separate instruments for every stage, you outline a single configuration and reuse it throughout runs.

Key advantages for this workflow:

  • Recipe-driven coaching: Outline your configuration as soon as and reuse it throughout experiments, lowering boilerplate and bettering reproducibility
  • Versatile fine-tuning: Select full fine-tuning or parameter-efficient strategies like LoRA, based mostly in your constraints
  • Built-in analysis: Rating checkpoints utilizing benchmarks or LLM-as-a-judge with out further tooling
  • Information synthesis: Generate task-specific datasets when manufacturing knowledge is proscribed

Amazon Bedrock enhances this by offering managed, serverless inference. After fine-tuning with Oumi, you import your mannequin by way of Customized Mannequin Import in three steps: add to S3, create the import job, and invoke. No inference infrastructure to handle. The next structure diagram reveals how these parts work collectively.

Determine 1: Oumi manages knowledge, coaching, and analysis on EC2. Amazon Bedrock offers managed inference by way of Customized Mannequin Import.

Answer overview

This workflow consists of three levels:

  1. Wonderful-tune with Oumi on EC2: Launch a GPU-optimized occasion (for instance, g5.12xlarge or p4d.24xlarge), set up Oumi, and run coaching along with your configuration. For bigger fashions, Oumi helps distributed coaching with Absolutely Sharded Information Parallel (FSDP), DeepSpeed, and Distributed Information Parallel (DDP) methods throughout multi-GPU or multi-node setups.
  2. Retailer artifacts on S3: Add mannequin weights, checkpoints, and logs for sturdy storage.
  3. Deploy to Amazon Bedrock: Create a Customized Mannequin Import job pointing to your S3 artifacts. Amazon Bedrock provisions inference infrastructure routinely. Shopper functions name the imported mannequin utilizing the Amazon Bedrock Runtime APIs.

This structure addresses widespread challenges in transferring fine-tuned fashions to manufacturing:

Technical implementation

Let’s stroll by means of a hands-on workflow utilizing the meta-llama/Llama-3.2-1B-Instruct mannequin for instance. Whereas we chosen this mannequin because it pairs effectively with fine-tuning on an AWS g6.12xlarge EC2 occasion, the identical workflow could be replicated throughout many different open supply fashions (word that bigger fashions might require bigger cases or distributed coaching throughout cases). For extra info, see the Oumi mannequin fine-tuning recipes and Amazon Bedrock customized mannequin architectures.

Stipulations

To finish this walkthrough, you want:

Arrange AWS Sources

  1. Clone this repository in your native machine:
git clone https://github.com/aws-samples/sample-oumi-fine-tuning-bedrock-cmi.git
cd sample-oumi-fine-tuning-bedrock-cmi
  1. Run the setup script to create IAM roles, an S3 bucket, and launch a GPU-optimized EC2 occasion:
./scripts/setup-aws-env.sh [--dry-run]

The script prompts to your AWS Area, S3 bucket title, EC2 key pair title, and safety group ID, then creates all required sources. Defaults: g6.12xlarge occasion, Deep Studying Base AMI with Single CUDA (Amazon Linux 2023), and 100 GB gp3 storage. Be aware: In case you should not have permissions to create IAM roles or launch EC2 cases, share this repository along with your IT administrator and ask them to finish this part to arrange your AWS setting.

  1. As soon as the occasion is working, the script outputs the SSH command and the Amazon Bedrock import position ARN (wanted in Step 5). SSH into the occasion and proceed with Step 1 beneath.

See the iam/README.md for IAM coverage particulars, scoping steerage, and validation steps.

Step 1: Arrange the EC2 setting

Full the next steps to arrange the EC2 setting.

  1. On the EC2 occasion (Amazon Linux 2023), replace the system and set up base dependencies:
sudo yum replace -y
sudo yum set up python3 python3-pip git -y
  1. Clone the companion repository:
git clone https://github.com/aws-samples/sample-oumi-fine-tuning-bedrock-cmi.git
cd sample-oumi-fine-tuning-bedrock-cmi
  1. Configure setting variables (change the values along with your precise area and bucket title from the setup script):
export AWS_REGION=us-west-2
export S3_BUCKET=your-bucket-name 
export S3_PREFIX=your-s3-prefix 
aws configure set default.area "$AWS_REGION"
  1. Run the setup script to create a Python digital setting, set up Oumi, validate GPU availability, and configure Hugging Face authentication. See setup-environment.sh for choices.
./scripts/setup-environment.sh
supply .venv/bin/activate
  1. Authenticate with Hugging Face to entry gated mannequin weights. Generate an entry token at huggingface.co/settings/tokens, then run:
hf auth login

Step 2: Configure coaching

The default dataset is tatsu-lab/alpaca, configured in configs/oumi-config.yaml. Oumi downloads it routinely throughout coaching, no handbook obtain is required. To make use of a special dataset, replace the dataset_name parameter in configs/oumi-config.yaml. See the Oumi dataset docs for supported codecs.

[Optional] Generate artificial coaching knowledge with Oumi:

To generate artificial knowledge utilizing Amazon Bedrock because the inference backend, replace the model_name placeholder in configs/synthesis-config.yaml with an Amazon Bedrock mannequin ID you may have entry to (e.g. anthropic.claude-sonnet-4-6). See Oumi knowledge synthesis docs for particulars. Then run:

oumi synth -c configs/synthesis-config.yaml

Step 3: Wonderful-tune the mannequin

Wonderful-tune the mannequin utilizing Oumi’s built-in coaching recipe for Llama-3.2-1B-Instruct:

./scripts/fine-tune.sh --config configs/oumi-config.yaml --output-dir fashions/closing [--dry-run]

To customise hyperparameters, edit oumi-config.yaml.

Be aware: In case you generated artificial knowledge in Step 2, replace the dataset path within the config earlier than coaching.

Monitor GPU utilization with nvidia-smi or Amazon CloudWatch Agent. For long-running jobs, configure Amazon EC2 Automated Occasion Restoration to deal with occasion interruptions.

Step 4: Consider mannequin (Optionally available)

You’ll be able to consider the fine-tuned mannequin utilizing normal benchmarks:

oumi consider -c configs/evaluation-config.yaml

The analysis config specifies the mannequin path and benchmark duties (e.g., MMLU). To customise, edit evaluation-config.yaml. For LLM-as-a-judge approaches and extra benchmarks, see Oumi’s analysis information.

Step 5: Deploy to Amazon Bedrock

Full the next steps to deploy the mannequin to Amazon Bedrock:

  1. Add mannequin artifacts to S3 and import the mannequin to Amazon Bedrock.
./scripts/upload-to-s3.sh --bucket $S3_BUCKET --source fashions/closing --prefix $S3_PREFIX
./scripts/import-to-bedrock.sh --model-name my-fine-tuned-llama --s3-uri s3://$S3_BUCKET/$S3_PREFIX --role-arn $BEDROCK_ROLE_ARN --wait
  1. The import script outputs the mannequin ARN on completion. Set MODEL_ARN to this worth (format: arn:aws:bedrock:::imported-model/).
  2. Invoke the mannequin on Amazon Bedrock
./scripts/invoke-model.sh --model-id $MODEL_ARN --prompt "Translate this textual content to French: What's the capital of France?"
  1. Amazon Bedrock creates a managed inference setting routinely. For IAM position arrange, see bedrock-import-role.json.
  2. Allow S3 versioning on the bucket to assist rollback of mannequin revisions. For SSE-KMS encryption and bucket coverage hardening, see the safety scripts within the companion repository.

Step 6: Clear up

To keep away from ongoing prices, take away the sources created throughout this walkthrough:

aws ec2 terminate-instances --instance-ids $INSTANCE_ID
aws s3 rm s3://$S3_BUCKET/$S3_PREFIX/ --recursive
aws bedrock delete-imported-model --model-identifier $MODEL_ARN

Conclusion

On this put up, you discovered how one can fine-tune a Llama-3.2-1B-Instruct base mannequin utilizing Oumi on EC2 and deploy it utilizing Amazon Bedrock Customized Mannequin Import. This method provides you full management over fine-tuning with your personal knowledge whereas utilizing managed inference in Amazon Bedrock.

The companion sample-oumi-fine-tuning-bedrock-cmi repository offers scripts, configurations, and IAM insurance policies to get began. Clone it, swap in your dataset, and deploy a customized mannequin to Amazon Bedrock.

To get began, discover the sources beneath and start constructing your personal fine-tuning-to-deployment pipeline on Oumi and AWS. Comfortable Constructing!

Be taught Extra

Acknowledgement

Particular due to Pronoy Chopra and Jon Turdiev for his or her contribution.


Concerning the authors

Bashir Mohammed

Bashir is a Senior Lead GenAI Options Architect on the Frontier AI crew at AWS, the place he companions with startups and enterprises to architect and deploy production-scale GenAI functions. With a PhD in Pc Science, his experience spans agentic programs, LLM analysis and benchmarking, fine-tuning, post-training optimization, reinforcement studying from human suggestions and scalable ML infrastructure. Exterior of labor, he mentors early-career engineers and helps group technical applications.

Bala Krishnamoorthy

Bala is a Senior GenAI Information Scientist on the Amazon Bedrock GTM crew, the place he helps startups leverage Bedrock to energy their merchandise. In his free time, he enjoys spending time with household/buddies, staying lively, attempting new eating places, journey, and kickstarting his day with a steaming scorching cup of espresso.

Greg Fina

Greg is a Principal Startup Options Architect for Generative AI at Amazon Internet Companies, the place he empowers startups to speed up innovation by means of cloud adoption. He makes a speciality of software modernization, with a powerful deal with serverless architectures, containers, and scalable knowledge storage options. He’s keen about utilizing generative AI instruments to orchestrate and optimize large-scale Kubernetes deployments, in addition to advancing GitOps and DevOps practices for high-velocity groups. Exterior of his customer-facing position, Greg actively contributes to open supply initiatives, particularly these associated to Backstage.

David Stewart

David leads Subject Engineering at Oumi, the place he works with clients to enhance their generative AI functions by creating customized language fashions for his or her use case. He brings intensive expertise working with LLMs, together with fashionable agentic, RAG, and coaching architectures. David is deeply within the sensible aspect of generative AI and the way individuals and organizations can create impactful merchandise and options that work at scale.

Matthew Individuals

Matthew is a cofounder and engineering chief at Oumi, the place he focuses on constructing and scaling sensible, open generative AI programs for real-world use instances. He works carefully with engineers, researchers, and clients to design strong architectures throughout the whole AI improvement pipeline. Matthew is keen about open-source AI, utilized machine studying, and enabling groups to maneuver shortly from analysis proofs of idea to impactful merchandise.

Related Articles

Latest Articles