Nvidia announced the Nvidia Physical AI Data Factory Blueprint, an open reference architecture that unifies and automates how training data is generated, augmented and evaluated, reducing the costs, time and complexity of training physical AI systems at scale.
The blueprint enables developers to use Nvidia Cosmos open world foundation models
and leading coding agents to transform limited training data into large, diverse datasets —
including rare edge cases and long-tail scenarios that are expensive, time-consuming and
often impractical to capture in the real world.
Nvidia is collaborating with Microsoft Azure and Nebius to integrate the open blueprint
with their cloud infrastructure and services, enabling developers to turn accelerated
computing power into high-volume training data. Leading physical AI developers FieldAI,
Hexagon Robotics, Linker Vision, Milestone Systems, Robo Force, Skild AI, Teradyne
Robotics and Uber are using the blueprint to accelerate robotics, vision AI agents and
autonomous vehicle development.
Nvidia announced the news during the GTC keynote by CEO Jensen Huang at the company’s GTC event on Monday in San Jose, California.
“Physical AI is the next frontier of the AI revolution, where success depends on the ability
to generate massive amounts of data,” said Rev Lebaredian, vice president of Omniverse
and simulation technologies at Nvidia, in a statemen. “Together with cloud leaders, we’re providing a new kind of agentic engine that transforms compute into the high-quality data required to bring the next generation of autonomous systems and robots to life. In this new era, compute is data.”
A unified engine for physical AI development
Physical AI follows scaling laws: Performance improves as data, compute and model
capacity grow. The Physical AI Data Factory Blueprint serves as a single reference
architecture that moves teams from raw data to model-ready training sets through
modular, automated workflows:
● Curate and Search: Nvidia Cosmos Curator processes, refines and annotates large-
scale real-world and synthetic datasets.
● Augment and Multiply: Cosmos Transfer exponentially expands and diversifies curated data, multiplying real and simulated inputs to better capture rare and long-tail scenarios across environments and lighting conditions.
● Evaluate and Validate: Nvidia Cosmos Evaluator, powered by Cosmos Reason and
now available on GitHub, automatically scores, verifies and filters generated data to
ensure physical accuracy and training readiness.
Nvidia is using the Physical AI Data Factory Blueprint to train and evaluate Nvidia
Alpamayo, the world’s first open reasoning-based vision language action models for long-tail autonomous driving. Skild AI is applying the blueprint to advance general-purpose
robot foundation models, while Uber is using it to accelerate autonomous vehicle
development.
Agent driven orchestration at scale
Many robotics developers are not equipped to stand up and manage the complex AI
infrastructure required to generate data at scale.
Nvidia Osmo, an open source orchestration framework, unifies and manages these
workflows across compute environments, reducing manual tasks so developers can focus
on building their models.
Osmo now integrates with leading coding agents such as Claude Code, OpenAI Codex and
Cursor, enabling AI-native operations where agents proactively manage resources, resolve
bottlenecks and accelerate model delivery at scale.
Cloud service providers play a critical role in providing the accelerated AI infrastructure,
machine learning operations and orchestration services developers need to build and
deploy physical AI at scale.
Microsoft Azure is integrating the Physical AI Data Factory Blueprint into an open physical
AI toolchain, now available on GitHub. The blueprint offers integration with Azure services
— including Azure IoT Operations, Microsoft Fabric, Real-Time Intelligence, Microsoft Foundry and GitHub Copilot — to provide enterprise-grade, agent-driven workflows for
training and validating physical AI systems quickly and at scale.
FieldAI, Hexagon Robotics, Linker Vision and Teradyne Robotics are among the first to test
the Azure physical AI toolchain for accelerating and scaling data generation, augmentation
and evaluation across their perception, mobility and reinforcement learning pipelines.
Nebius has integrated Osmo into its AI Cloud, enabling developers to use the blueprint to
deploy production-ready data pipelines tailored to their needs. Nebius’s infrastructure
powers the physical AI stack end to end, blending Nvidia RTX Pro 6000 Blackwell Server
Edition GPUs with ultrafast object storage, native data management and labeling,
serverless execution and built-in managed inference.
Early users Milestone Systems, Voxel51 and RoboForce are harnessing the blueprint on
Nebius infrastructure to accelerate model development for video analytics AI agents,
autonomous vehicles and industrial humanoid robots.
The Nvidia Physical AI Data Factory Blueprint is expected to be available on GitHub in
April.