Nvidia says Vera Rubin opens Agentic AI frontier

Become a member of GB MAX to gain exclusive access to the industry and to the most influential global B2B leadership community in the business of gaming, entertainment, and tech. Join now and also get a VIP ticket to GamesBeat Next (Nov 2-3, SF).

Nvidia announced the Nvidia Vera Rubin platform is opening the next frontier of agentic AI, with seven new chips now in full production to scale the world’s largest AI factories.

The platform brings together the Nvidia Vera CPU, Nvidia Rubin GPU, Nvidia NVLink 6
Switch, Nvidia ConnectX-9 SuperNIC, Nvidia BlueField-4 DPU and Nvidia Spectrum-6
Ethernet switch, as well as the newly integrated Nvidia Groq 3 LPU.

Designed to operate together as one incredible AI supercomputer, the chips power every phase of AI — from massive-scale pretraining, post-training and test-time scaling to real-time agentic inference.

Nvidia announced the news during the GTC keynote by CEO Jensen Huang at the company’s GTC event on Monday in San Jose, California.

“Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant
supercomputer — built to power every phase of AI,” said Huang, in a statement. “The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history.”

“Enterprises and developers are using Claude for increasingly complex reasoning, agentic
workflows and mission-critical decisions. That demands infrastructure that can keep
pace,” said Dario Amodei, CEO of Anthropic, in a statement. “Nvidia’s Vera Rubin platform
gives us the compute, networking and system design to keep delivering while advancing
the safety and reliability our customers depend on.”

“Nvidia infrastructure is the foundation that lets us keep pushing the frontier of AI,” said
Sam Altman, CEO of OpenAI, in a statement. “With Nvidia Vera Rubin, we’ll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people.”

Shift to POD-scale systems

AI infrastructure is rapidly evolving — from discrete chips and standalone servers to fully
integrated rack-scale systems, pod-scale deployments, AI factories and sovereign AI.

These advances are driving dramatic gains in performance, improving cost efficiency for organizations of all sizes and across industries — from startups and mid-sized businesses
to public-private institutions and enterprises — while helping democratize access to AI and
improving energy efficiency to power the world’s most demanding workloads.

Through deep codesign across compute, networking and storage, supported by an
ecosystem of more than 80 MGX ecosystem partners with a global supply chain, Nvidia
Vera Rubin offers the most extensive Nvidia POD-scale platform — a supercomputer
where multiple racks purpose-built for AI work together as one massive, coherent system.

Nvidia Vera Rubin NVL72 Rack

Integrating 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs, Vera Rubin NVL72 delivers breakthrough efficiency — training large mixture-of-experts models with one-fourth the number of GPUs compared with the NVIDIA Blackwell platform and achieving up to 10 times higher inference throughput per watt at one-tenth the cost per token.

Designed for hyperscale AI factories worldwide, NVL72 scales seamlessly with Nvidia Quantum-X800 InfiniBand and Spectrum-X Ethernet to sustain high utilization across massive GPU clusters while reducing time to train and total cost of ownership.

Nvidia Vera CPU Rack

Reinforcement learning and agentic AI workloads rely on large numbers of CPU-based
environments to test and validate the results generated by models running on GPU
systems.

The Nvidia Vera CPU Rack delivers dense, liquid-cooled infrastructure built on Nvidia MGX, integrating 256 Vera CPUs to provide scalable, energy-efficient capacity with world-class single-threaded performance, unlocking agentic AI at scale.

Integrated with Spectrum-X Ethernet networking, Vera CPU racks keep CPU environments
tightly synchronized across the AI factory. Together with GPU compute racks, they provide
the CPU foundation for large-scale agentic AI and reinforcement learning — with Vera
delivering results twice as efficiently and 50% faster than traditional CPUs.

Nvidia Groq 3 LPX Rack

Nvidia Groq 3 LPX marks a milestone in accelerated computing. Designed for the low-
latency and large-context demands of agentic systems, LPX and Vera Rubin unite the extreme performance of both processors to deliver up to 35x higher inference throughput
per megawatt and up to 10x more revenue opportunity for trillion-parameter models.

At scale, a fleet of LPUs function as a giant single processor for fast, deterministic
inference acceleration. The LPX rack with 256 LPU processors features 128GB of on-chip
SRAM and 640 TB/s of scale-up bandwidth. Deployed with Vera Rubin NVL72, Rubin GPUs
and LPUs boost decode by jointly computing every layer of the AI model for every output
token.

Optimized for trillion-parameter models and million-token context, the codesigned LPX
architecture pairs with Vera Rubin to maximize efficiency across power, memory and
compute.

The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers. Fully liquid cooled and built on MGX infrastructure, LPX integrates seamlessly into next-generation Vera Rubin AI factories to be available in the second half of this year.

Nvidia BlueField-4 STX storage rack

The Nvidia BlueField-4 STX rack-scale system is an AI-native storage infrastructure that
extends GPU memory seamlessly across the POD. Powered by BlueField-4 — combining
the Nvidia Vera CPU and Nvidia ConnectX-9 SuperNIC — STX delivers a high-bandwidth
shared layer optimized for storing and retrieving the massive key-value cache data generated by large language models and agentic AI workflows.

Nvidia DOCA Memos — a new DOCA framework that supercharges BlueField-4 storage
— enables dedicated KV cache storage processing to boost inference throughput by up to
five times while significantly improving power efficiency compared with general-purpose storage architectures. The result is POD-wide context that delivers faster multi-turn interactions with AI agents, more scalable AI services and higher overall infrastructure utilization.

“The Nvidia BlueField-4 STX rack-scale context memory storage system will enable a
critical performance boost needed to exponentially scale our agentic AI efforts,” said
Timothée Lacroix, CTO of Mistral AI, in a statement. “By delivering a new storage tier purpose-built for AI agents memory, STX is ideally positioned to ensure that our models can maintain coherence and speed when reasoning across massive datasets.”

Nvidia Spectrum-6 SPX Ethernet rack

Spectrum-6 SPX Ethernet is engineered to accelerate east-west traffic across AI factories.
Configurable with either Spectrum-X Ethernet or Nvidia Quantum-X800 InfiniBand
switches, it delivers low-latency, high-throughput rack-to-rack connectivity at scale.

Spectrum-X Ethernet Photonics with co-packaged optics achieves up to 5x greater optical
power efficiency and 10x higher resiliency compared with traditional pluggable
transceivers.

Improving resiliency and energy efficiency

Nvidia, along with over 200 data center infrastructure partners, announced DSX for the
Vera Rubin platform. The new DSX platform includes DSX Max-Q to enable dynamic power
provisioning across the entire AI factory, resulting in the deployment of 30% more AI
infrastructure within a fixed-power data center. The new DSX Flex software enables AI
factories to be grid-flexible assets, unlocking 100 gigawatts of stranded grid power.

Nvidia also today released the Vera Rubin DSX AI Factory reference design, a blueprint for
codesigned AI infrastructure that maximizes tokens per watt and overall goodput, improving system resiliency and accelerating time to first production.

By tightly integrating compute, networking, storage, power and cooling, the architecture
increases energy efficiency and ensures AI factories can scale reliably under continuous,
high-intensity workloads with maximum uptime.