Nvidia announces full production for Vera CPU shipping later this year

Become a member of GB MAX to gain exclusive access to the industry and to the most influential global B2B leadership community in the business of gaming, entertainment, and tech. Join now and also get a VIP ticket to GamesBeat Next (Nov 2-3, SF).

Nvidia announced full production for the Nvidia Vera CPU, the world’s first processor purpose-built for the age of agentic AI and reinforcement learning — delivering results with twice the efficiency and 50% faster than traditional rack-scale CPUs.

As reasoning and agentic AI advances, scale, performance and cost are increasingly driven by the infrastructure supporting the models that plan tasks, run tools, interact with data, run code and validate results.

The launch shows that Nvidia is sticking to its plan of launching major processors and platforms just about every year. And this one comes with seven brand-new chips in the Vera Rubin platform. It is shipping in the second half of this year and production has started.

The Nvidia Vera CPU builds on the success of the Nvidia Grace CPU, enabling organizations of all sizes and across industries to build AI factories that unlock agentic
AI at scale. With the highest single-thread performance and bandwidth per core, Vera
is a new class of CPU that delivers higher AI throughput, responsiveness and efficiency
for large-scale AI services such as coding assistants, as well as consumer and enterprise agents.

Leading hyperscalers collaborating with Nvidia to deploy Vera include Alibaba, CoreWeave, Meta and Oracle Cloud Infrastructure, as well as global system makers Dell Technologies, HPE, Lenovo, Supermicro and others. This broad adoption establishes Vera as the new CPU standard for the AI workloads that matter most for developers, startups, public-private institutions and enterprises — helping democratize access to AI and accelerating innovation.

Nvidia announced the news during the GTC keynote by CEO Jensen Huang at the company’s GTC event on Monday in San Jose, California.

“Vera is arriving at a turning point for AI. As intelligence becomes agentic — capable of
reasoning and acting — the importance of the systems orchestrating that work is
elevated,” said Huang, in a statement. “The CPU is no longer simply supporting the model; it’s driving it. With breakthrough performance and energy efficiency, Vera unlocks AI systems that think faster and scale further.”

In the past 10 years, Nvidia said it has achieved 40 million times more compute, Huang said in his keynote speech.

There’s a reason Nvidia is making such a big investment every year.

“In order to bring this next generation’s AI frontier opportunity, we need to deliver tokens to 15 times faster and with 10 times larger models, going from 100 billion parameters to 10 trillion parameters,” said Ian Buck, general manager of Nvidia’s Hyperscale and HPC Computing Business, in a press briefing. “Where AI is just talking AI as fast as we can to deliver the business outcomes, the engineering optimizations, the value that AI is going to bring.”

Configurable for every data center

Nvidia announced a new Vera CPU rack integrating 256 liquid-cooled Vera CPUs to sustain more than 22,500 concurrent CPU environments, each running independently at full performance. AI factories can quickly deploy and scale to tens of thousands of simultaneous instances and agentic tools in a single rack.

The new Vera rack is built using the Nvidia MGX modular reference architecture, supported by 80 ecosystem partners worldwide.

As part of the Nvidia Vera Rubin NVL72 platform, Vera CPUs are paired with NVIDIA
GPUs through Nvidia NVLinkTM-C2C interconnect technology, with 1.8 TB/s of
coherent bandwidth — seven times the bandwidth of PCIe Gen 6 — for high-speed data sharing between CPUs and GPUs. Additionally, Nvidia introduced new reference designs that use Vera as the host CPU for Nvidia HGX Rubin NVL8 systems, coordinating data
movement and system control for GPU-accelerated workloads.

Vera systems partners are providing both dual and single-socket CPU server configurations, optimal for workloads such as reinforcement learning, agentic inference, data processing, orchestration, storage management, cloud applications and high-performance computing.

Across all configurations, Vera systems integrate Nvidia ConnectX SuperNIC cards and Nvidia BlueField-4 DPUs for accelerated networking, storage and security, which are critical for agentic AI. This enables customers to optimize for their specific workloads while maintaining a single software stack across the Nvidia platform.

Designed for agentic scaling

By combining high-performance, energy-efficient CPU cores, a high-bandwidth
memory subsystem and the second-generation Nvidia Scalable Coherency Fabric, Vera
enables faster agentic responses under the extreme utilization conditions common for
agentic AI and reinforcement learning.

Vera features 88 custom NVIDIA-designed Olympus cores, delivering high performance
for compilers, runtime engines, analytics pipelines, agentic tooling and orchestration
services. Each core can run two tasks, using NVIDIA Spatial Multithreading, to deliver
consistent, predictable performance — ideal for multi-tenant AI factories running many
jobs at once.

To further enhance energy efficiency, Vera introduces the second generation of Nvidia’s low-power memory subsystem, now built on LPDDR5X memory and delivering up to 1.2 TB/s of bandwidth — twice the bandwidth and at half the power compared with general-purpose CPUs.

Widespread ecosystem support

Redpanda, a leading streaming data and AI platform, is using Vera to dramatically
boost performance.

“Redpanda recently tested Nvidia Vera running Apache Kafka-compatible workloads and saw dramatically better performance than other systems we’ve benchmarked, delivering up to 5.5 times lower latency,” said Alex Gallego, CEO of Redpanda, in a statement. “Vera represents a new direction in CPU architecture, with more memory and less overhead per core, enabling our customers to scale real-time streaming workloads further than ever and unlock new AI and agentic applications.”

National laboratories planning to deploy Vera CPUs include Leibniz Supercomputing
Centre, Los Alamos National Laboratory, Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center and the Texas Advanced Computing Center (TACC).

“At TACC, we recently tested Nvidia’s Vera CPU platform as we prepare for deployment in our upcoming Horizon system — and running six of our scientific applications, we saw impressive early results,” said John Cazes, director of high-performance computing at TACC, in a statement. “Vera’s per-core performance and memory bandwidth represent a giant step forward for scientific computing, and we look forward to bringing Vera-based nodes to our CPU users on Horizon later this year.”

Leading cloud service providers planning to deploy Vera CPUs include Alibaba, ByteDance, Cloudflare, CoreWeave, Crusoe, Lambda, Nebius, Nscale, Oracle Cloud Infrastructure, Together.AI and Vultr.

Leading infrastructure providers adopting Vera CPUs include Aivres, ASRock Rack, ASUS, Compal, Cisco, Dell, Foxconn, GIGABYTE, HPE, Hyve, Inventec, Lenovo, MiTAC, MSI, Pegatron, Quanta Cloud Technology QCT, Supermicro, Wistron and Wiwynn.

Availability

Nvidia Vera is in full production and will be available from partners in the second half
of this year.