Delos AI improves inference with 1,000X higher scale

Become a member of GB MAX to gain exclusive access to the industry and to the most influential global B2B leadership community in the business of gaming, entertainment, and tech. Join now and also get a VIP ticket to GamesBeat Next (Nov 2-3, SF).

Delos Data announced a new cluster architecture and server design that accelerates AI inference through cabled scale-up.

Unlike existing solutions, it addresses inference performance at a unified system level by reducing GPU idle time caused by data center interconnect bottlenecks — and is designed to scale to thousands of GPUs and accelerators.

“The GPU is no longer the bottleneck — the network is. The next leap in AI performance will come not just from more compute but from how the compute is connected,” said Ed Doe, Co-founder and CEO of Delos Data, in a statement. “With the interconnect market set to hit $100 billion by 2030, the industry is waking up to this. What Delos Data has built is exactly what the moment demands — a unified system architecture that eliminates fragmentation between layers and lets AI inference run the way it was meant to: fast, at scale, and without compromise.”

Highlights of the Delos Data Nonstop AI approach
 Large ecosystem: Built within today’s GPU and accelerator ecosystems
 Higher scale: Delos Server brings scale-up IO to faceplace enabling larger scale-
up domains
 Better Visibility: Delos Mosaic TM software allows for fine-grained visibility into
scale-up domain

“The industry has reached an inference inflection point where traditional architectures
can no longer keep pace with the networking demands of massive GPU clusters,”
said Dan Daly, CTO of Delos Data, in a statement. “We are introducing a fundamentally new interconnect architecture that brings scale-up connectivity directly to the network. Our architecture allows for flexible scaling of GPU clusters to ensure the number of GPUs in the cluster can be right sized to the AI models and the inference performance requirements.”

Benefits for hyperscalers, AI infrastructure builders, and enterprises scaling inference:
 Reduced cost per token
 Improved tokens per watt
 Increased tokens per second, driving greater revenue per second
Experience Delos at Computex 2026

Delos Data will demonstrate a Nonstop AI live at Computex 2026 in Booth J1328, showcasing real-time inference bottleneck detection using Delos Mosaic TM and AI workload tuning in a production-scale environment. Attendees will see firsthand how Mosaic delivers actionable cluster intelligence to accelerate and stabilize inference workloads.

Availability
Early access software deployments with select customers available now; broader
availability is planned for Q4 2026.