Nvdia launches Dynamo open source software for AI factories

Become a member of GB MAX to gain exclusive access to the industry and to the most influential global B2B leadership community in the business of gaming, entertainment, and tech. Join now and also get a VIP ticket to GamesBeat Next (Nov 2-3, SF).

Nvidia announced Nvidia Dynamo 1.0, open source software for generative and agentic inference at scale for AI factories.

Together with the Nvidia Blackwell platform, Dynamo 1.0 enables cloud providers, AI innovators and global enterprises to deliver high-performance AI inference with unmatched scale, efficiency and speed. Inference is the stage in AI computing where a trained model is used to make predictions on new data.

As agentic AI systems move into production across industries, scaling inference within a
data center has become a complex challenge of resource orchestration, with requests of
varying sizes and modalities, as well as performance objectives, arriving in unpredictable
bursts.

Just as a computer’s operating system coordinates hardware and applications, Dynamo 1.0 functions as the distributed “operating system” of AI factories, seamlessly orchestrating GPU and memory resources across the cluster to power complex AI workloads. In recent industry benchmarks, Dynamo boosted the inference performance of Nvidia Blackwell GPUs by up to seven times, lowering token cost and increasing revenue opportunity for millions of GPUs with free, open source software.

Nvidia announced the news during the GTC keynote by CEO Jensen Huang at the company’s GTC event on Monday in San Jose, California.

“Inference is the engine of intelligence, powering every query, every agent and every
application,” said Huang, in a statement. “With Nvidia Dynamo, we’ve created the first-ever ‘operating system’ for AI factories. The rapid adoption across our ecosystem shows this next wave of agentic AI is here, and Nvidia is powering it at global scale.”

Dynamo 1.0 splits inference work across GPUs by adding smarter “traffic control” and the
ability to move data between GPUs and lower-cost storage, reducing wasted work and
easing memory limits. For agentic AI and long prompts, it can route requests to GPUs that
already have the most relevant “short-term memory” from earlier steps, then offload that
memory when it is not needed.

Nvidia Inference Platform Gains Momentum

Last year, Nvidia acquired Groq’s inference‑optimized intellectual property and it brought on board its engineering talent to augment its existing inference platform.

Nvidia is accelerating the open source ecosystem by integrating Dynamo and Nvidia
TensorRT-LLM library optimizations into popular frameworks from providers such as
LangChain, llm-d, LMCache, SGLang, vLLM and more.

Core Dynamo building blocks like KVBM for smarter memory management, Nvidia NIXL for fast GPU-to-GPU data movement and Nvidia Grove for simplified scaling are also available as standalone modules. Nvidia also contributes TensorRT-LLM CUDA kernels to the FlashInfer project so they can be natively integrated into open source frameworks.

The Nvidia inference platform is supported across the AI ecosystem, including:
● Cloud Service Providers: Amazon Web Services (AWS), Microsoft Azure, Google
Cloud, OCI
● NVIDIA Cloud Partners: Alibaba Cloud, CoreWeave, Crusoe, DigitalOcean, Gcore, GMI
Cloud, Lightning AI, Nebius, Nscale, Together AI, Vultr
● AI-Native Companies: Cursor, Hebbia, Perplexity
● Inference Endpoint Providers: Baseten, Deep Infra, Fireworks
● Global Enterprises: Amazon, AstraZeneca, BlackRock, ByteDance, Coupang, Instacart, Meituan, PayPal, Pinterest, Shopee, SoftBank Corp.

Chen Goldberg, executive vice president of product and engineering at CoreWeave, said in a statement, “As AI moves from experimental pilots to continuous, large-scale production, the underlying infrastructure must be as dynamic as the models it supports. Supporting Nvidia Dynamo allows us to offer a more seamless, resilient environment for deploying complex AI agents. This foundation provides the durability and high-performance orchestration required to move the industry’s most ambitious agentic workloads into global production.”

Dynamo 1.0 is available today to developers worldwide.