Inworld AI is launching Inworld TTS-1, a new text-to-speech voice AI model aimed at making more realistic and expressive speech for gaming avatars, AI companions and virtual assistants.
The aim is to offer state-of-the-art and affordable voice AI for every developer, whether in games or other categories of business. TTS-1 offers a new generation of text-to-speech models that deliver cutting-edge quality and latency for the most accessible price on the market.
Inworld’s flagship model TTS-1 offers realistic, context-aware speech synthesis and precise zero-shot voice cloning, outperforming comparable solutions from leading labs.
TTS-1 is available today via API and can be experienced in the TTS Playground, where developers can test pre-built voices or clone your own from a short audio sample.
The company is also releasing TTS-1-Max, a larger, more expressive model, as a research preview.
Powering the next generation of AI apps

For too long, developers have faced a false choice: use high-quality, expressive speech that is slow and expensive, or settle for affordable solutions that lack realism, Inworld said. The company’s goal is to eliminate this trade-off and build the voice layer for the next generation of consumer AI applications. Here’s what makes TTS-1 different.
- Unmatched quality. TTS-1 generates speech that is rich, emotive, and virtually indistinguishable from human speech. It captures subtle nuances in tone and prosody, making interactions feel natural and engaging. This power is now at your fingertips in 11 languages with TTS-1 and TTS-1-Max. Inworld is also releasing a research preview of audio markups, such as happy or whispering, which give users a new level of control over how the model speaks, not just what it says.
- Blazing-fast for real-time interactions. With the first 2-second audio chunk ready as soon as 500ms, TTS-1 is built for real-time applications. TTS-1 is already available through popular AI voice platforms like LiveKit and Vapi, with additional integrations coming soon, and can power everything from educational companions and fitness trainers, to shopping assistants and open world games. The development and technical achievements of Inworld’s TTS-1 were accelerated by partners like Modular and Lightning AI. Inworld will be sharing more about each of these partnerships and use cases in the coming weeks.
- Radically affordable for every developer. State-of-the-art AI should not be a luxury. The team optimized its entire stack to offer TTS-1 at a disruptive price of $5 per one million characters. On top of that, it has made powerful zero-shot voice cloning free for all users. Now, every developer and team, from indie hacker to enterprise, can integrate production-grade voice AI into their products without breaking the budget.
Inworld said it is excited to see how developers across all verticals will leverage its tech to build experiences the team hasn’t even imagined.
A commitment to open innovation
Inworld believes that transparency and community collaboration are the catalysts for true progress. In that spirit, the company is making its research accessible to all. In the coming weeks, Inworld will publish a detailed technical report on TTS-1’s architecture and training methodology.
Furthermore, Inworld will open source its ready-to-use training repository on GitHub under a commercially permissive license. This will provide a step-by-step guide to recreating our work, from SpeechLM pre-training to SFT & RLHF, empowering researchers and developers to build upon the foundation.
This is just the beginning. Inworld will be working on continuously improving model quality and affordability. This TTS architecture has proven to be a flexible framework, and the company is already experimenting with new capabilities, such as creating voices from their natural language descriptions, which the company plans to release later this year.
Trust & Safety
Powerful technology demands profound responsibility. Inworld is committed to ensuring voice generation technology is used safely and ethically.
- All synthesized audio from the TTS platform contains an imperceptible watermark to ensure it can be identified as AI-generated.
- Inworld has implemented robust safeguards to prevent the cloning of voices without explicit consent.
- Inworld will actively prohibit and will act against any uses that violate its policies, such as malicious impersonation or fraudulent activity.
Inworld is dedicated to collaborating with the broader research community to advance safety standards for all voice AI.
How to get started
Experience the Inworld TTS difference today:
- Try the TTS Playground to hear the quality for yourself.
- Clone your voice instantly with just a few seconds of audio.
- Read the API Docs and start building now.
For even higher fidelity, fine-tuned voice clones and customized enterprise plans for high-volume use cases, you can reach out to the team for more information.
Inworld said it wants feedback as it refines and expands its TTS capabilities.