Tenstorrent, the AI compute company led by chip pioneer Jim Keller, announced te general availability of Tenstorrent Galaxy Blackhole deployed at scale, delivering industry-leading general-purpose AI performance.
Other solutions require bolting together separate accelerators across fragmented infrastructure. Tenstorrent’s Networked AI delivers them natively – compute, memory, and networking unified into a single system optimized for real-world AI workloads.
Tenstorrent is the latest Silicon Valley startup from Keller, who led pioneering computing projects at companies such as Tesla, Apple, PA Semi, Intel, and Advanced Micro Devices. Now he’s built Tenstorrent into a company in Santa Clara, California, with 1,000 employees and more than $1 billion in funding.
Amr El-Ashmawi, vice president of marketing; verticals and partnerships at Tenstorrent, spoke with me in an interview.
El-Ashmawi said, “We’re announcing achievements on performance, capabilities, our software and customer adoption.”
El-Ashmawi added, “It’s not just introducing a new piece of hardware or an AI server, which is the Blackhole. In this case, it has the latest and greatest generation chip within the form factor of a 6U system, which we call Galaxy, and it’s really a new way of how we build AI infrastructure.”
General-purpose means leading performance on every workload defining modern AI, not specializing in one. Tenstorrent Galaxy tops video generation, large-context LLM inference in both prefill and decode, and the full range of model architectures shipping today.

You can see it for yourself on Friday, May 1st at 1:30pm Pacific at Tenstorrent’s launch event, TT-Deploy. Watch the livestream:https://tenstorrent.com/deploy
10x Faster Real-time High-quality AI Video Generation
AI Video Generation on Tenstorrent Galaxy is 10x faster than leading GPU systems. In collaboration with Prodia, the industry’s fastest video generation is now 10x faster running on a Tenstorrent Galaxy supercluster and generating 720p, 81-frame video in brisk 2.4 seconds. Run state-of-the-art video models and generate high quality videos faster on Tenstorrent Galaxy superclusters.
“We were already leading the Artificial Analysis leaderboard, and working with Tenstorrent allowed us to unlock another 10x improvement in video generation speed. The integration was seamless, and the performance gains were immediate.” Mikhail Avady and Monty Anderson said, co-founders of Prodia Labs.
Blitz Mode: Fastest and Largest-Context LLM Inference

Blitz Mode on Tenstorrent Galaxy, optimized for premium, latency-sensitive AI workloads, enables 350+ t/s/u and sub-4-second time-to-first-token on Deepseek-R1-0528 671B, beating the leading comparable GPU systems. Tenstorrent Galaxy superclusters run high margin AI use cases including agentic workflows, real-time systems, and long-context reasoning.
Tenstorrent Galaxy Performance Benchmarks
- Decode: DeepSeek-R1-0528 671B up to 350+ tokens/second/user –– faster than the fastest inference systems from Groq and Cerebras in performance and capacity supporting batch sizes from 8 to 64 and up to 128k context
- Prefill: DeepSeek-R1-0528 671B sub-4-second time-to-first-token on 100K context –– running on the same general-purpose AI Tenstorrent Galaxy superclusters
Full-Stack AI, Ready for Production
Tenstorrent provides a complete AI solution — from hardware to software to deployment. Tenstorrent Galaxy integrates with open-source frameworks through TT-ForgeTM and TT-Lang, and supports rapid model bring-up, enabling customers to deploy production AI systems without vendor lock-in or proprietary stacks. 90% of models from HuggingFace just work on Tenstorrent hardware.
Getting an edge
“It’s not about giving customers hardware and saying good luck. No, no,” said El-Ashmawi. “It’s about a whole integrated system that we’re doing here. The challenging part of this is that accelerators are very fragmented. There are proprietary stacks around GPUs. And what we’re doing is that we’re providing a scalable open infrastructure. It’s all unified and under the network,” El-Ashmawi said.
“AI is not about putting compute after the FLOPs, or floating point operations. It’s also about data movement,” El-Ashmawi said. “Compute doesn’t become first anymore. That’s not the important thing for inference. It’s all about the data movement.”
The questions are about how you derive the data? You solve the memory bottlenecks that are in place for AI. And then the most important thing is reducing the system complexity, he said. He said GPUs are great at compute, but they struggle with a lot of different areas around latency, which is really important, he said.
“Scaling is not their forte. The new AI workloads are requiring bigger models, higher context, agentic AI, video gen, image gen,” he said. “Getting the data through and out is really, really important,” he said.
AI companies have to learn to distinguish between commodity tokens versus premium tokens, he said. GPUs can solve commodity tokens in large batches today, said El-Ashmawi. But cloud providers can generate a lot of revenues on commodity tokens, but not a lot of profit, he said.
A premium token is where agentic AI is required, where video generation will be needed, and then having these large models. That requires low latency, high throughput per user, and the batch becomes critical, he said.
“That’s solving the premium needs there. So if I have those attributes right, and I run them through low latency, and am able to do the throughput per user, and I can scale the throughput per user seamlessly, that allows me to start making profit on the tokens that are being driven here,” said El-Ashmawi. “It’s not about the speeds and feeds or the bits and bytes. It’s all about the tokenomics and how I am able to achieve that people want. That’s where we win.”
El-Ashmawi said the important metrics include tokens per second per watt.
He said, “You can take the number of tokens. How much I can get it out, and how much power did it cost me to do that?”
The next metric is tokens per second per dollar. He said that’s the ultimate goal. “How much did it cost me to get this out?” he said.
He said, “At the end of the day, it’s about how do I combine the tokens per second per watt and token per second per dollar, and get something that makes more sense.”
Tenstorrent’s plan for going to market is solving a total cost of ownership problem. That involves taking the capital spending and operations spending and combining those together to figure out TCO.
“It comes ultimately to the TCO for the customer, right? And so that’s our focus,” he said. “That’s what people are looking for. But as I drive into the premium tokens and I push more tokens through, it becomes cheaper because I’m developing more tokens at a lower cost point.”
He said that if you look at GPU today, it costs about $20 per 1000 images generated. Tenstorrent is driving toward $4 per 1000 images generated. Customers want the better economics around video and image generation.
Networked AI

These results are enabled by an architecture built around a different constraint. Most AI accelerators treat compute as the primary design problem. Tenstorrent instead solved data placement and data flow first which enables performance through scaling.
“Every company in the industry is pairing up to build the accelerator accelerator accelerator. CPUs run code. GPUs accelerate CPUs. TPUs accelerate GPUs. LPUs accelerate TPUs. And so on. This leads to complex solutions which are unlikely to be compatible with changes in AI models and uses. At Tenstorrent, we thought something more general and simpler would work,” said Jim Keller, CEO of Tenstorrent.
The result is what Tenstorrent calls Networked AI: a new model for AI infrastructure where compute, memory, and networking are unified into a single system optimized for real-world AI workloads. By combining efficient data placement and data flow, high bandwidth on-chip memory, and Ethernet-based scale-out, the architecture scales from a single core to thousands of servers under one software model, without proprietary interconnects, without reconfiguration, and without the rigid workload declarations that make competing systems brittle as models evolve.
Deployments
Tenstorrent Galaxy superclusters are one of the new foundations of Equinix’s Distributed AI Hub, a full-stack AI orchestration platform for agentic workloads, launching today with partners BetterBrain and OrionVM.
Equinix’s Distributed AI Hub helps customers and partners cover every layer from infrastructure to application, and plugs into legacy enterprise systems, enabling customers to deploy, and operate, sovereign agentic AI systems.
- Equinix: A global digital infrastructure company that provides colocation and interconnection services, enabling enterprises and partners to deploy and scale AI – along with other mission-critical workloads – securely, efficiently, and in close proximity to users, clouds, and data.
- OrionVM: Next-gen heterogeneous cloud platform partner powering the orchestration and infrastructure layer for Tenstorrent-based AI services.
- BetterBrain: A full-stack AI platform and deployment partner delivering secure, customizable, production-ready AI applications and agentic workflows on Tenstorrent infrastructure.
“Tenstorrent brings immense value to our Distributed AI Hub by fundamentally rethinking how AI workloads are executed—from optimizing data flow on-chip across prefill and decode, to orchestrating the full AI stack. This level of architectural intelligence allows enterprises to stay focused on building differentiated products, not managing infrastructure complexity,” said Justen Aguillon, Director of Technology Partner Ecosystems.
“We’re enabling a new class of AI factories—high-performance, cost-efficient environments with the flexibility to run both frontier and open-source models, and the embedded telemetry and governance required to scale agentic systems globally.”
Additional deployments announced today include:
- Virtu Financial, a tier-1 market maker working with Tenstorrent to enable real-world AI systems: on-premises agentic AI solutions for trading and operational automation
- Turiyam, a next-generation semiconductor and AI infrastructure company building datacenter-scale inference chips, software, and systems from India for the world
- Cirrascale, a top tier neocloud with cloud services for agentic applications and generative AI, available in the US and multiple international regions
- ai&, Japan’s vertically integrated AI platform: the largest installation of Tenstorrent hardware to power AI infrastructure, models, and applications across Japan and around the world.
“We evaluate a lot of hardware. Most of it is incremental. Tenstorrent Galaxy Blackhole is not. Tenstorrent has taken a clean-sheet approach to AI infrastructure, and the results speak for themselves. Putting this in the hands of our customers is exactly the kind of move Cirrascale exists to make.” said Dave Driggers, CEO of Cirrascale Cloud Services, in a statement.
Run anything – Fast, Simple, Affordable – with Tenstorrent Galaxy Blackhole.
Tenstorrent Galaxy Blackhole is Tenstorrent’s 6U AI air-cooled compute server built with Tenstorrent’s next-generation Blackhole chips and fully open-source software stack. Starting at $110,000, it delivers 23 PFLOPS Block FP8 of AI compute from 32 Blackhole chips, 6.2 GB of on-chip SRAM with 2.9 PB/s, 1 TB of DRAM with 16 TB/s, and up to 56 × 800G Ethernet ports for 11.2 GB/s of scale-out bandwidth.
Tenstorrent Galaxy Blackhole systems scale seamlessly from a single server to multi-rack deployments using standard Ethernet networking. Customers deploy configurations ranging from 4 to 36 or more Tenstorrent Galaxy systems, optimized for workloads including AI video generation, large-scale LLM inference, and private AI infrastructure. Our base Tenstorrent Galaxy Blackhole supercluster of four Tenstorrent Galaxies starts at $440,000.
The company has deployed systems with multiple customers such as Equinix and it is shipping production systems to a handful of customers. The developers are working with the tech and are offering feedback. One advantage of Tenstorrent’s tech is that it can be put on premises for companies, rather than put only in data centers. That improves security and sovereignty, said El-Ashmawi. And the software is open source.
“Over this past year, we’ve been able to improve on the software, get the orchestration running, get the systems out into production. And now we started shipping our first production systems back in March,” said El-Ashmawi.
While high-bandwidth memory is in short supply, El-Ashmawi said Tenstorrent chose to go with standard GDDR6 and GDDR7 memory instead. That results in lower costs for the system, he said.
Dream team?
Keller was the chip architect of the Apple A4/A5, AMD Zen, and Tesla’s Full Self-Driving chip. The company builds RISC-V-based AI processors and systems for developers, enterprises, and sovereign infrastructure worldwide. In addition to servers and workstations, Tenstorrent licenses its Ascalon RISC-V CPU and Tensix AI cores to chip designers including Samsung and LG.
Backed by Bezos Expeditions, Samsung, LG Electronics, Hyundai Motor Group, Fidelity, and others, Tenstorrent has raised over $1 billion and operates from Santa Clara, Austin, Toronto, Belgrade, Tokyo, and Bangalore.
As for the “dream team” aspect of Keller’s leadership, El-Ashmawi said that what is interesting about Tenstorrent is, despite its size, it follows Keller’s philosophy of breaking down big teams into small teams that can innovate more quickly.
And he said, “Jim knows what works, what doesn’t. He’s taking those experiences. He always reflects” on what worked and what didn’t with his prior companies.