Run NVIDIA Nemotron 3 Tremendous on Amazon Bedrock

March 21, 2026

3

Nemotron 3 Tremendous is now accessible as a totally managed and serverless mannequin on Amazon Bedrock, becoming a member of the Nemotron Nano fashions which are already accessible throughout the Amazon Bedrock surroundings.

With NVIDIA Nemotron open fashions on Amazon Bedrock, you possibly can speed up innovation and ship tangible enterprise worth with out managing infrastructure complexities. You may energy your generative AI functions with Nemotron by means of the totally managed inference of Amazon Bedrock, utilizing its in depth options and tooling.

This put up explores the technical traits of the Nemotron 3 Tremendous mannequin and discusses potential utility use instances. It additionally supplies technical steerage to get began utilizing this mannequin on your generative AI functions throughout the Amazon Bedrock surroundings.

About Nemotron 3 Tremendous

Nemotron 3 Tremendous is a hybrid Combination of Specialists (MoE) mannequin with main compute effectivity and accuracy for multi-agent functions and for specialised agentic AI methods. The mannequin is launched with open weights, datasets, and recipes so builders can customise, enhance, and deploy the mannequin on their infrastructure for enhanced privateness and safety.

Mannequin overview:

Structure:
- MoE with Hybrid Transformer-Mamba structure.
- Helps token price range for offering improved accuracy with minimal reasoning token technology.
Accuracy:
- Highest throughput effectivity in its measurement class and as much as 5x over the earlier Nemotron Tremendous mannequin.
- Main accuracy for reasoning and agentic duties amongst main open fashions and as much as 2x larger accuracy over the earlier model.
- Achieves excessive accuracy throughout main benchmarks, together with AIME 2025, Terminal-Bench, SWE Bench verified and multilingual, RULER.
- Multi-environment RL coaching gave the mannequin main accuracy throughout 10+ environments with NVIDIA NeMo.
Mannequin measurement: 120 B with 12 B energetic parameters
Context size: as much as 256K tokens
Mannequin enter: Textual content
Mannequin output: Textual content
Languages: English, French, German, Italian, Japanese, Spanish, and Chinese language

Latent MoE

Nemotron 3 Tremendous makes use of latent MoE, the place specialists function on a shared latent illustration earlier than outputs are projected again to token house. This strategy permits the mannequin to name on 4x extra specialists on the similar inference price, enabling higher specialization round refined semantic buildings, area abstractions, or multi-hop reasoning patterns.

Multi-token prediction (MTP)

MTP permits the mannequin to foretell a number of future tokens in a single ahead go, considerably rising throughput for lengthy reasoning sequences and structured outputs. For planning, trajectory technology, prolonged chain-of-thought, or code technology, MTP reduces latency and improves agent responsiveness.

To be taught extra about Nemotron 3 Tremendous’s structure and the way it’s skilled, see Introducing Nemotron 3 Tremendous: an Open Hybrid Mamba Transformer MoE for Agentic Reasoning.

NVIDIA Nemotron 3 Tremendous use instances

Nemotron 3 Tremendous helps energy numerous use instances for various industries. A number of the use instances embrace

Software program improvement: Help with duties like code summarization.
Finance: Speed up mortgage processing by extracting knowledge, analyzing earnings patterns, and detecting fraudulent operations, which might help cut back cycle occasions and danger.
Cybersecurity: Can be utilized to triage points, carry out in-depth malware evaluation, and proactively hunt for safety threats.
Search: Can assist perceive person intent to activate the precise brokers.
Retail: Can assist optimize stock administration and improve in-store service with real-time, personalised product suggestions and help.
Multi-agent Workflows: Orchestrates process‑particular brokers—planning, software use, verification, and area execution—to automate advanced, finish‑to‑finish enterprise processes.

Get Began with NVIDIA Nemotron 3 Tremendous in Amazon Bedrock. Full the next steps to check NVIDIA Nemotron 3 Tremendous in Amazon Bedrock

Navigate to the Amazon Bedrock console and choose Chat/Textual content playground from the left menu (beneath the Check part).
Select Choose mannequin within the upper-left nook of the playground.
Select NVIDIA from the class checklist, then choose NVIDIA Nemotron 3 Tremendous.
Select Apply to load the mannequin.

After finishing the earlier steps, you possibly can check the mannequin instantly. To really showcase Nemotron 3 Tremendous’s functionality, we are going to transfer past easy syntax and process it with a fancy engineering problem. Excessive-reasoning fashions excel at “system-level” considering the place they need to stability architectural trade-offs, concurrency, and distributed state administration.

Let’s use the next immediate to design a globally distributed service:

"Design a distributed rate-limiting service in Python that should help 100,000 requests per second throughout a number of geographic areas.

1. Present a high-level architectural technique (e.g., Token Bucket vs. Mounted Window) and justify your alternative for a worldwide scale. 2. Write a thread-safe implementation utilizing Redis because the backing retailer. 3. Tackle the 'race situation' drawback when a number of situations replace the identical counter. 4. Embody a pytest suite that simulates community latency between the app and Redis."

This immediate requires the mannequin to function as a senior distributed-systems engineer — reasoning about trade-offs, producing thread-safe code, anticipating failure modes, and validating all the things with reasonable exams, all in a single coherent response.

Utilizing the AWS CLI and SDKs

You may entry the mannequin programmatically utilizing the mannequin ID nvidia.nemotron-super-3-120b . The mannequin helps each the InvokeModel and Converse APIs by means of the AWS Command Line Interface (AWS CLI) and AWS SDK with nvidia.nemotron-super-3-120b because the mannequin ID. Additional, it helps the Amazon Bedrock OpenAI SDK appropriate API.

Run the next command to invoke the mannequin immediately out of your terminal utilizing the AWS Command Line Interface (AWS CLI) and the InvokeModel API:

aws bedrock-runtime invoke-model  
 --model-id nvidia.nemotron-super-3-120b  
 --region us-west-2  
 --body '{"messages": [{"role": "user", "content": "Type_Your_Prompt_Here"}], "max_tokens": 512, "temperature": 0.5, "top_p": 0.9}'  
 --cli-binary-format raw-in-base64-out  
invoke-model-output.txt

If you wish to invoke the mannequin by means of the AWS SDK for Python (Boto3), use the next script to ship a immediate to the mannequin, on this case by utilizing the Converse API:

import boto3 
from botocore.exceptions import ClientError 

# Create a Bedrock Runtime shopper within the AWS Area you need to use. 
shopper = boto3.shopper("bedrock-runtime", region_name="us-west-2") 

# Set the mannequin ID
model_id = "nvidia.nemotron-super-3-120b" 

# Begin a dialog with the person message. 

user_message = "Type_Your_Prompt_Here" 
dialog = [ 
   { 
       "role": "user", 

       "content": [{"text": user_message}], 
   } 
]  

strive: 
   # Ship the message to the mannequin utilizing a fundamental inference configuration. 
   response = shopper.converse( 
        modelId=model_id, 

       messages=dialog, 
        inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9}, 
   ) 
 
   # Extract and print the response textual content. 
    response_text = response["output"]["message"]["content"][0]["text"] 
   print(response_text)

besides (ClientError, Exception) as e: 
    print(f"ERROR: Cannot invoke '{model_id}'. Purpose: {e}") 
    exit(1)

To invoke the mannequin by means of the Amazon Bedrock OpenAI-compatible ChatCompletions endpoint you possibly can proceed as follows utilizing the OpenAI SDK:

# Import OpenAI SDK
from openai import OpenAI

# Set surroundings variables
os.environ["OPENAI_API_KEY"] = ""
os.environ["OPENAI_BASE_URL"] = "https://bedrock-runtime..amazon.com/openai/v1"

# Set the mannequin ID
model_id = "nvidia.nemotron-super-3-120b"

# Set prompts
system_prompt = “Type_Your_System_Prompt_Here”
user_message = "Type_Your_User_Prompt_Here"


# Use ChatCompletionsAPI
response = shopper.chat.completions.create(
    mannequin= mannequin _ID,                 
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user",   "content": user_message}
    ],
    temperature=0,
    max_completion_tokens=1000
)
 
# Extract and print the response textual content
print(response.selections[0].message.content material)

Conclusion

On this put up, we confirmed you methods to get began with NVIDIA Nemotron 3 Tremendous on Amazon Bedrock for constructing the subsequent technology of agentic AI functions. By combining the mannequin’s superior Hybrid Transformer-Mamba structure and Latent MoE with the totally managed, serverless infrastructure of Amazon Bedrock, organizations can now deploy high-reasoning, environment friendly functions at scale with out the heavy lifting of backend administration. Able to see what this mannequin can do on your particular workflow?

Attempt it now: Head over to the Amazon Bedrock Console to experiment with NVIDIA Nemotron 3 Tremendous within the mannequin playground.
Construct: Discover the AWS SDK to combine Nemotron 3 Tremendous into your current generative AI pipelines.

Run NVIDIA Nemotron 3 Tremendous on Amazon Bedrock

About Nemotron 3 Tremendous

Latent MoE

Multi-token prediction (MTP)

NVIDIA Nemotron 3 Tremendous use instances

Utilizing the AWS CLI and SDKs

Conclusion

Concerning the authors

Related Articles

Closing out my tabs: Saturday version

Getting Began with Android Generative AI

Apple folding telephone launch date, design, show & iPhone Fold rumors

Latest Articles

Closing out my tabs: Saturday version

Getting Began with Android Generative AI

Apple folding telephone launch date, design, show & iPhone Fold rumors

Astronomers preserve discovering new moons of Jupiter and Saturn

This instance begins with a chi-square however ends with a lesson on how even well-written prompts can lead to hallucinations.