Thursday, February 12, 2026

NVIDIA Nemotron 3 Nano 30B MoE mannequin is now out there in Amazon SageMaker JumpStart


Right this moment we’re excited to announce that the NVIDIA Nemotron 3 Nano 30B mannequin with  3B lively parameters is now usually out there within the Amazon SageMaker JumpStart mannequin catalog. You may speed up innovation and ship tangible enterprise worth with Nemotron 3 Nano on Amazon Net Companies (AWS) with out having to handle mannequin deployment complexities. You may energy your generative AI purposes with Nemotron capabilities utilizing the managed deployment capabilities supplied by SageMaker JumpStart.

Nemotron 3 Nano is a small language hybrid combination of consultants (MoE) mannequin with the best compute effectivity and accuracy for builders to drive highly-skilled agentic duties at scale. The mannequin is absolutely open with open-weights, datasets, and recipes, so builders can seamlessly customise, optimize, and deploy the mannequin on their infrastructure to assist meet their privateness and safety necessities. Nemotron 3 Nano excels in coding and reasoning, and leads on benchmarks similar to SWE Bench Verified, GPQA Diamond, AIME 2025, Enviornment Onerous v2, and IFBench.

About Nemotron 3 Nano 30B

Nemotron 3 Nano is differentiated from different fashions by its structure and accuracy, boasting sturdy efficiency in quite a lot of extremely technical abilities:

  • Structure:
    • ο      MoE with hybrid Transformer-Mamba architectureο      Helps token finances for offering optimum accuracy with minimal reasoning token technology
  • Accuracy:
    • Main accuracy on coding, scientific reasoning, math, and instruction following
    • Leads on benchmarks similar to LiveCodeBench, GPQA Diamond, AIME 2025, BFCL , and IFBench (in comparison with different open language fashions beneath 30B)
  • Usability:
    • 30B parameter mannequin with 3 billion lively parameters
    • Has a context window of as much as 1 million tokens
    • Textual content-based basis mannequin, utilizing textual content for each inputs and outputs

Conditions

To get began with Nemotron 3 Nano in Amazon SageMaker JumpStart, you could have a provisioned Amazon SageMaker Studio area.

Get began with NVIDIA Nemotron 3 Nano 30B in SageMaker JumpStart

To check the Nemotron 3 Nano mannequin in SageMaker JumpStart, open SageMaker Studio and select Fashions within the navigation pane.  Seek for NVIDIA within the search bar and select NVIDIA Nemotron 3 Nano 30B because the mannequin.

On the mannequin particulars web page, select Deploy and comply with the prompts to deploy the mannequin.

After the mannequin is deployed to a SageMaker AI endpoint, you may check it. You may entry the mannequin utilizing the next AWS Command Line Interface (AWS CLI) code examples. You should utilize nvidia/nemotron-3-nano because the mannequin ID.

cat > enter.json << EOF
{
"mannequin": "${MODEL_ID}",
"messages": [
{
 	"role": "system",
 	"content": "You are a helpful assistant."
 },
 {
 	"role": "user",
       	"content": "What is NVIDIA? Answer in 2-3 sentences."
}],
"max_tokens": 512,
"temperature": 0.2,
"stream": False, # Set to False for non-streaming mode,
   	"chat_template_kwargs": {"enable_thinking": False} # Set to False for non-reasoning mode
}
EOF
 
aws sagemaker-runtime invoke-endpoint 
--endpoint-name ${ENDPOINT_NAME} 
--region ${AWS_REGION} 
--content-type 'utility/json' 
--body fileb://enter.json 
> response.json

Alternatively, you may entry the mannequin utilizing SageMaker SDK and Boto3 code. The next Python code examples present learn how to ship a textual content message to the NVIDIA Nemotron 3 Nano 30B utilizing the SageMaker SDK. For added code examples, confer with the NVIDIA GitHub repo.

runtime_client = boto3.shopper('sagemaker-runtime', region_name=area) 
payload = {
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "max_tokens": 1000
    }
    
    attempt:
        response = self.runtime_client.invoke_endpoint(
            EndpointName=self.endpoint_name,
            ContentType="utility/json",
            Physique=json.dumps(payload)
        )
        
        response_body = response['Body'].learn().decode('utf-8')
        raw_response = json.hundreds(response_body)
        
        # Parse the response utilizing our customized parser
        return self.parse_response(raw_response)
        
    besides Exception as e:
        increase Exception(
            f"Did not invoke endpoint '{self.endpoint_name}': {str(e)}. "
            f"Test that the endpoint is InService and you've got least-privileged IAM permissions assigned."
        )

Now out there

NVIDIA Nemotron 3 Nano is now out there absolutely managed in SageMaker JumpStart. Discuss with the mannequin bundle for AWS Area availability. To study extra, try the Nemotron Nano mannequin web page, the NVIDIA GitHub pattern pocket book for Nemotron 3 Nano 30B, and the Amazon SageMaker JumpStart pricing web page.

Attempt the Nemotron 3 Nano mannequin in Amazon SageMaker JumpStart right now and ship suggestions to AWS re:Put up for SageMaker JumpStart  or via your regular AWS Assist contacts.


Concerning the authors

Dan Ferguson is a Options Architect at AWS, primarily based in New York, USA. As a machine studying companies skilled, Dan works to assist prospects on their journey to integrating ML workflows effectively, successfully, and sustainably.

Pooja Karadgi leads product and strategic partnerships for Amazon SageMaker JumpStart, the machine studying and generative AI hub inside SageMaker. She is devoted to accelerating buyer AI adoption by simplifying basis mannequin discovery and deployment, enabling prospects to construct production-ready generative AI purposes throughout all the mannequin lifecycle – from onboarding and customization to deployment.

Benjamin Crabtree is a Senior Software program Engineer on the Amazon SageMaker AI group, specializing in delivering the “final mile” expertise to prospects. He’s enthusiastic about democratizing the most recent synthetic intelligence breakthroughs by providing simple to make use of capabilities. Additionally, Ben is very skilled in constructing machine studying infrastructure at scale.

Timothy Ma is a Principal Specialist in generative AI at AWS, the place he collaborates with prospects to design and deploy cutting-edge machine studying options. He additionally leads go-to-market methods for generative AI companies, serving to organizations harness the potential of superior AI applied sciences.

Abdullahi Olaoye is a Senior AI Options Architect at NVIDIA, specializing in integrating NVIDIA AI libraries, frameworks, and merchandise with cloud AI companies and open-source instruments to optimize AI mannequin deployment, inference, and generative AI workflows. He collaborates with AWS to reinforce AI workload efficiency and drive adoption of NVIDIA-powered AI and generative AI options.

Nirmal Kumar Juluru is a product advertising and marketing supervisor at NVIDIA driving the adoption of AI software program, fashions, and APIs within the NVIDIA NGC Catalog and NVIDIA AI Basis fashions and endpoints. He beforehand labored as a software program developer. Nirmal holds an MBA from Carnegie Mellon College and a bachelors in laptop science from BITS Pilani.

Vivian Chen is a Deep Studying Options Architect at NVIDIA, the place she helps groups bridge the hole between complicated AI analysis and real-world efficiency. Specializing in inference optimization and cloud-integrated AI options, Vivian focuses on turning the heavy lifting of machine studying into quick, scalable purposes. She is enthusiastic about serving to shoppers navigate NVIDIA’s accelerated computing stack to make sure their fashions don’t simply work within the lab, however thrive in manufacturing.

Related Articles

Latest Articles