Inworld AI announced its Inworld Runtime will be an AI runtime engineered to scale consumer applications.
It lets developers go from prototype to production faster and supports user growth from 10 to 10 million users with minimal code changes.
By automating AI operations, Inworld Runtime frees up engineering resources for new product development and provides the tools to design and deploy no-code experiments. Inworld AI’s current partners – major media companies, triple-A studios and AI-native startups are already leveraging Runtime as the foundation of their AI stacks for their next generation of real-time, multi-million-user AI features and experiences.
It’s interesting to see Inworld AI’s expansion of its product line, as it started with a focus on making AI tools for game developers to make things such as smart non-player characters for games.
“We built Runtime because we needed it ourselves. Existing tools couldn’t deliver at the speed and scale our partners required,” Kylan Gibbs, CEO of Inworld, in a statement. “When we realized every consumer AI company faces these same barriers, we knew we had to open up what we’d built. We’ve watched the industry reach an inflection point. Thousands of builders are hitting the same scaling wall we did, so we hustled over the past year to add capabilities beyond our internal needs to create the universal backend to accelerate the entire consumer AI ecosystem.”
Built for internal purposes, it now enables all consumer builders
Coming from Google and DeepMind, Inworld AI’s founding team recognized the momentum AI had flowing into business automation and professional-facing applications, while consumer surfaces lagged behind.
So the company built Inworld to help the benefits of AI reach everyone, everywhere, by enabling the next generation of consumer applications. We began with AI agents for gaming and media partners including Xbox, Disney, NVIDIA, Niantic, and NBCUniversal.
To accelerate this work, Inworld AI built Runtime as internal infrastructure to handle the unique demands of consumer AI: maintaining real-time performance at multi-million concurrent user scale, user-specific quality expectations focused on engagement, and costs well under a cent per user per day.
As companies from health/fitness, learning, and social applications began approaching the company, Inworld AI discovered these companies faced the exact same challenges that Runtime was already solving internally and decided to release it publicly.
“We built Runtime because we needed it ourselves. Existing tools couldn’t deliver at the speed and scale our partners required,” Evgenii Shingarev, VP of engineering at Inworld AI, in a statement. “When we realized every consumer AI company faces these same barriers, we knew we had to open up what we’d built. We’ve watched the industry reach an inflection point. Thousands of builders are hitting the same scaling wall we did, so we hustled over the past year to add capabilities beyond our internal needs to create the universal backend to accelerate the entire consumer AI ecosystem.”
Learning 3 factors determining leaders in consumer AI
Through four years of deploying consumer AI applications, Inworld AI said it discovered three critical factors that determine success or failure. Excellence in all three is mandatory. Weakness in any one will prevent a consumer AI feature or application from achieving market leadership:
1. Time from prototype to production
While creating an AI demo takes hours, reaching production-readiness typically requires 6+ months of infrastructure and quality improvement work. Teams must handle provider outages, implement fallbacks, manage rate limits, provision and accelerate compute capacity, optimize costs, and ensure consistent quality. In building with category leaders, we saw how most consumer AI projects either make the leap or they stall out and die in the gap.
2. Resource allocation to new product development
Post-launch, most engineering teams spend over 60% of their time on maintenance tasks: debugging provider changes, managing model updates, handling scale issues, and optimizing costs. This leaves minimal resources for building new features, causing products to stagnate while competitors advance. Inworld experienced this firsthand, as even innovative teams get trapped in maintenance cycles instead of building what users want next.
3. Experimentation velocity
Consumer preferences continuously evolve, but traditional deployment cycles of 2-4 weeks cannot match this pace. Teams need to test dozens of variations, measure real user impact, and scale winners. All without the friction of code deployments and app store approvals. Working with partners across the industry showed us that the fastest learner wins, but existing infrastructure makes rapid iteration nearly impossible.
“We scaled from prototype to 1 million users in 19 days with over 20× cost reduction,” said Fai, Status CEO, in a statement.
And Ivan Murat, CEO of Nanobit, said in a statement, “Inworld AI provides two levers for us: personalization and content. We always need to experiment with features. Or else, with attention spans so short, users will find a new app.”
Inworld Runtime’s technical design
Inworld Runtime delivers these capabilities through multiple innovations, including:
1. Adaptive Graphs
A C++-based graph execution system that solves scaling cross-platform limitations faced by most AI frameworks, with SDKs for Node.js, Python and others. Developers compose applications using pre-optimized nodes as building blocks (with APIs from top providers for LLM, TTS, STT, knowledge, memory and much more) that handle low-level integration work and automatically optimize data streams between components. The same graph seamlessly scales from 10 test users to 10 million concurrent users with minimal code changes and managed endpoints. With vibe-coding friendly interfaces this directly enables the leap from prototype to production in days, not months.
2. Automated MLOps
Beyond basic operations, Runtime provides self-contained infrastructure automation with integrated telemetry capturing logs, traces, and metrics across every interaction. Actionable insights, such as identifying bugs, user patterns, and optimization opportunities, are surfaced through the Portal, our observability and experiment management platform. Runtime performs automatic failover between providers, manages capacity across models, and handles rate limiting intelligently. It also supports custom on-premise deployments with optimized model hosting for enterprises. As applications scale, we provide access to all necessary cloud infrastructure to train, tune, and host custom models that break the cost-quality frontier of default models.
3. Live Experiments
One-click to deploy or scale experiments. Configuration separated from code enables instant A/B tests without deployment friction. Runtime can automatically run hundreds of experiments simultaneously by defining variants via SDK and managing tests through the Portal, testing different models, prompts, graph configurations, and logic flows. Changes deploy in seconds with automatic impact measurement on user metrics.
Proven results from early adopters of Inworld Runtime
Runtime deployments demonstrate consistent technical achievements:
- Our largest partners (major IP owners, media companies and AAA studios) are already leveraging Runtime as the foundation of their AI stacks
- Wishroll scaled from prototype to 1 million users in 19 days with over 95% cost reduction
- Little Umbrella is able to ship new AI games while they use Inworld to reduce update and maintenance effort for existing titles
- Streamlabs built a multimodal real-time streaming assistant with features that were unfeasible even six months ago
- Bible Chat upgraded and scaled their voice features while reducing voice costs by 85%
- Nanobit delivers personalized AI narratives to millions at sustainable unit economics
Availability and Pricing
Developers can get started immediately by downloading Runtime SDKs, with comprehensive documentation and migration guide. Runtime works natively with code assistants like Cursor, Claude Code, Google CLI, Windsurf, and Zencoder.
Start with your own project or use Inworld’s templates and demo apps as inspiration. Runtime deploys flexibly in client applications, on any cloud provider’s servers, or through custom on-premise installations with Inworld-managed model hosting. Once in production, the Portal can be used for observability and rapid experimentation.
Runtime pricing is entirely usage-based with no upfront costs. Developers can experiment with all models and capabilities and only pay for what scales successfully, ensuring alignment with consumer application economics where costs must remain sustainable as usage grows. With access to state-of-the-art models from Anthropic, Google, Mistral, and OpenAI, developers have maximum choice to easily test and select the optimal model for their use case.
Runtime also provides access to top open-source models like Deepseek, Llama, and Qwen through lightning-fast providers Groq, Tenstorrent and Fireworks AI. Developers with existing Microsoft or Google relationships can utilize their cloud commitments to access Runtime through the Azure Marketplace and Google Cloud Marketplace.
Inworld is building out the AI Runtime for consumer applications. The firm was founded in 2021 and it has raised over $120 million from investors including Lightspeed Venture Partners, Kleiner Perkins, Section32, Founders Fund, Stanford University, and Microsoft’s M12.
The company will host the inaugural Consumer AI Summit in Spring 2026 in San Francisco, bringing together technical leaders building and scaling the next generation of consumer AI applications.