Nvidia today launched Nvidia Cosmos 3, an open world foundation model for physical AI built on a breakthrough mixture-of-transformers architecture that combines vision reasoning, world generation and action prediction in a single system.
Cosmos 3 is the world’s first fully open omnimodel that can natively understand and generate
text, images, video, ambient sound and actions with leading physics accuracy, reducing physical AI training and evaluation cycles from months to days, said Jensen Huang, CEO of Nvidia, at Nvidia GTC Taipei and Computex 2026 in Taiwan.
Nvidia also launched the Cosmos Coalition, a global collaboration between world model builders and AI developers — including Agile Robots, Black Forest Labs, Generalist, LTX,
Runway and Skild AI — working together to advance next-generation world models.
“The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models,” said Huang, in a statement. “The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, autonomous vehicles and vision AI that perceive, reason, plan and act in the physical world.”
A New Architecture for Physical AI
Cosmos 3 tackles a fundamental challenge in physical AI: enabling robots, autonomous
vehicles (AVs) or vision agents to generalize in the real world with limited training data and
fragmented simulation stacks.
The model’s mixture-of-transformers architecture pairs a reasoning transformer with an expert generation transformer, enabling Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories.
Trained on one of the largest multimodal physical AI datasets — including billions of samples across text, image, video, sound and action trajectories — the model gives developers a powerful pretrained foundation for building physical AI systems with less data and lower training costs.
Developers can use Cosmos 3 as:
● A vision language model that understands and reasons across modalities.
● A world model or video foundation model that simulates physical environments and
predicts future world states for training and evaluation.
● The backbone for world action models that help train robots to perform specific tasks.
Cosmos 3 delivers leading results on physical AI benchmarks. Among open models, it ranks first across Artificial Analysis, Physics-IQ, PAI-Bench and R-Bench for world generation accuracy, RoboLab and RoboArena for action policy, and the Vantage-Bench and TAR leaderboards for vision understanding.
The Cosmos 3 lineup gives developers options for different stages of physical AI development:
● Cosmos Super for post-training robotics and AV models that need the highest physics accuracy and generation quality.
● Cosmos Nano for high-quality video and action reasoning in fractions of a second.
● Cosmos Edge, coming soon, for real-time inference at the edge.
Cosmos Coalition Accelerates Open World Model Development
The Cosmos Coalition is a global collaboration between world model builders, AI developers and physical AI leaders to advance open world models across industries, enabling members to
contribute models, research and evaluation techniques while using Cosmos 3 technologies,
training tools and Nvidia DGX Cloud infrastructure for large-scale training.
Founding coalition members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway
and Skild AI. By building in the open and contributing across a shared ecosystem, the coalition
aims to enable faster innovation, broader interoperability and more rapid advances in physical
AI.
Developers Build on Cosmos
The Cosmos platform powers Nvidia’s physical AI stack to accelerate training and evaluation workflows across industries. The platform now includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety and spatial reasoning, as well as new physical AI agent skills for neural scene reconstruction, defect-image generation and video
augmentation.
Physical AI developers are building on the Cosmos platform across industries — Agile Robots, Doosan Robotics, LG Electronics, Samsung and Skild AI for robotics, LiAuto for autonomous vehicles, and Centific, Fogsphere, Linker Vision, Milestone Systems and Yuan for vision AI agents to power industrial AI and smart spaces applications.
Availability
Cosmos Super and Cosmos Nano are available now, with Cosmos Edge coming soon for real-time inference. Developers can try Cosmos 3 on build.nvidia.com, download open models from Hugging Face, customize models and generate synthetic data with Hugging Face Diffusers
and resources on GitHub, and deploy the models as Nvidia NIM microservices.
Model builders and software providers can accelerate access, customization and deployment of Cosmos for key reasoning and synthetic data generation workloads using physical AI agent skills on GitHub through inference services and cloud infrastructure partners including Baseten, CoreWeave, Microsoft Azure, Nebius, Deep Infra, and Classmethod.