AI is shifting quick, and for many of our clients, the actual alternative isn’t in experimenting with it—it’s in operating AI in manufacturing the place it drives significant enterprise outcomes. This implies constructing methods that run reliably, carry out at scale, and meet your group’s safety and compliance necessities.
At present at NVIDIA GTC 2026, AWS and NVIDIA introduced an expanded collaboration with new know-how integrations to assist rising AI compute demand and assist you to construct and run AI options which can be production-ready. These integrations span accelerated computing, interconnect applied sciences, and mannequin fine-tuning and inference. They embrace:
Main bulletins at NVIDIA GTC 2026
Scaling AI infrastructure with expanded GPU choices and optimized interconnect
Accelerating compute capability within the agentic AI period
Beginning in 2026, AWS will add greater than 1 million NVIDIA GPUs together with Blackwell and Rubin GPU architectures throughout our world cloud areas. AWS gives the broadest assortment of NVIDIA GPU-based cases of any cloud supplier to energy a various set of AI/ML workloads. AWS and NVIDIA are additionally collaborating on Spectrum networking and different infrastructure areas, including to over 15 years of joint innovation between our two firms.
AWS’s superior cloud and AI infrastructure gives enterprises, startups, and researchers with the infrastructure wanted to construct and scale agentic AI methods—able to reasoning, planning, and appearing autonomously throughout advanced workflows.
New Amazon EC2 cases with NVIDIA RTX PRO 4500 Blackwell Server Version GPUs
At present, we introduced that Amazon EC2 cases accelerated by NVIDIA RTX PRO 4500 Blackwell Server Version GPUs are coming quickly. AWS is the primary main cloud supplier to announce assist for RTX PRO 4500 Blackwell Server Version GPUs. These cases are well-suited for a variety of workloads, together with information analytics, conversational AI, content material era, recommender methods, video streaming, video rendering, and different graphics workloads.
Amazon EC2 cases accelerated by NVIDIA RTX PRO 4500 Blackwell Server Version GPUs will likely be constructed on the AWS Nitro System, a mixture of devoted {hardware} and light-weight hypervisor which delivers virtually all the compute and reminiscence assets of the host {hardware} to your cases for higher total useful resource utilization and efficiency. The Nitro System’s specialised {hardware}, software program, and firmware are designed to implement restrictions in order that no person, together with anybody at AWS, can entry your delicate AI workloads and information. As well as, the Nitro System helps firmware updates, bug fixes, and optimizations whereas the system stays operational. These capabilities throughout the Nitro System allow the improved useful resource effectivity, safety, and stability that AI, analytics, and graphics workloads require in manufacturing.
Accelerating interconnect for disaggregated LLM inference with NVIDIA NIXL on AWS EFA and Trainium
As mannequin sizes develop, communication overhead between GPUs or Trainium can develop into a bottleneck. At present, we introduced assist for NVIDIA Inference Xfer Library (NIXL) with AWS EFA to speed up disaggregated Giant Language Mannequin (LLM) inference on Amazon EC2, throughout NVIDIA GPUs and AWS Trainiums. Accelerating disaggregated inference is essential for scaling trendy AI workloads as a result of it permits environment friendly overlap of communication and computation whereas minimizing communication latency and maximizing GPU utilization. This integration permits high-throughput, low-latency KV-cache information motion between GPU compute nodes performing token era and distributed reminiscence assets that retailer KV-cache state. It additionally gives the pliability to construct inference clusters utilizing any mixture of GPU and Trainium EFA-enabled EC2 cases. NIXL with EFA integrates natively with common open-source frameworks similar to NVIDIA Dynamo, vLLM, and SGLang, delivering improved inter-token latency and extra environment friendly KV-cache reminiscence utilization.
Accelerating information analytics with Amazon EMR and NVIDIA GPUs
Working Apache Spark 3x sooner utilizing Amazon EMR on Amazon EKS with G7e cases
Information engineers and information scientists steadily face hours-long information processing pipelines that gradual AI/ML mannequin iteration and enterprise intelligence era. We’re seeing important efficiency beneficial properties for these workloads—AWS and NVIDIA ship 3x sooner efficiency for Apache Spark workloads with Amazon EMR on EKS on G7e cases. This efficiency outcomes from joint AWS-NVIDIA engineering collaboration optimizing GPU-accelerated analytics by combining Amazon EMR on EKS with NVIDIA’s RTX PRO 6000 structure. With Amazon EMR and G7e cases, information engineers and information scientists can speed up time-to-insight for AI/ML function engineering, advanced ETL transformations, and real-time analytics at scale. Clients operating large-scale information processing pipelines can reduce the time wanted to run analytics whereas sustaining full compatibility with present Spark purposes.
Increasing NVIDIA Nemotron mannequin assist on Amazon Bedrock
Wonderful-tuning Nemotron fashions in Amazon Bedrock with Reinforcement Wonderful-Tuning (Coming quickly)
Builders will quickly have the ability to fine-tune NVIDIA Nemotron fashions immediately on Amazon Bedrock utilizing Reinforcement Wonderful-Tuning (RFT). That is important for groups that have to align mannequin conduct to particular domains, whether or not that’s authorized, healthcare, finance, or some other specialised discipline. Reinforcement fine-tuning permits you to form how a mannequin causes and responds, not simply what it is aware of. And since this runs natively on Amazon Bedrock, there’s zero infrastructure overhead. You outline the duty, present the suggestions sign, and Bedrock handles the remainder. Study Reinforcement Wonderful-Tuning in Amazon Bedrock.
Nemotron 3 Tremendous on Amazon Bedrock (Coming quickly)
NVIDIA Nemotron 3 Tremendous—a hybrid MoE mannequin constructed for multi-agent workloads and prolonged reasoning—is coming quickly to Amazon Bedrock. Designed to allow AI brokers to keep up accuracy throughout advanced, multi-step workflows, it powers use instances throughout finance cybersecurity, retail , and software program improvement—delivering quick, cost-efficient inference by a totally managed API.
Enhancing power effectivity and sustainability
As AI workloads scale, efficiency per watt isn’t only a sustainability metric—it’s a aggressive benefit. In this NVIDIA GTC session, Amazon CSO Kara Hurst will be part of sustainability leaders from Equinix and PepsiCo to debate how AI is remodeling enterprise power and infrastructure at scale—from information facilities as lively grid individuals to AI as an enterprise effectivity engine, and the way AWS may help you obtain optimum power effectivity with AWS infrastructure being 4.1x extra energy-efficient than on-premises information facilities.
Constructed to run, collectively
What makes these bulletins thrilling isn’t any single functionality—it’s what they characterize collectively. Fifteen years of partnership between AWS and NVIDIA has produced a full stack of AI infrastructure optimized finish to finish, from the GPU to the community to the managed providers layer. You don’t must sew it collectively yourselves. It’s able to run.
In the event you’re at GTC this week, come discover us on the AWS sales space. Try stay demos, catch our in-booth theater periods, and choose up custom-made swag with AWS Swag Manufacturing facility.
Go to AWS at NVIDIA GTC 2026 to see the whole lot AWS has occurring on the convention.
Concerning the authors
