Delos Data announced a new cluster architecture and server design that accelerates AI inference through cabled scale-up.
Unlike existing solutions, it addresses inference performance at a unified system level by reducing GPU idle time caused by data center interconnect bottlenecks — and is designed to scale to thousands of GPUs and accelerators.
“The GPU is no longer the bottleneck — the network is. The next leap in AI performance will come not just from more compute but from how the compute is connected,” said Ed Doe, Co-founder and CEO of Delos Data, in a statement. “With the interconnect market set to hit $100 billion by 2030, the industry is waking up to this. What Delos Data has built is exactly what the moment demands — a unified system architecture that eliminates fragmentation between layers and lets AI inference run the way it was meant to: fast, at scale, and without compromise.”
Highlights of the Delos Data Nonstop AI approach
Large ecosystem: Built within today’s GPU and accelerator ecosystems
Higher scale: Delos Server brings scale-up IO to faceplace enabling larger scale-
up domains
Better Visibility: Delos Mosaic TM software allows for fine-grained visibility into
scale-up domain
“The industry has reached an inference inflection point where traditional architectures
can no longer keep pace with the networking demands of massive GPU clusters,”
said Dan Daly, CTO of Delos Data, in a statement. “We are introducing a fundamentally new interconnect architecture that brings scale-up connectivity directly to the network. Our architecture allows for flexible scaling of GPU clusters to ensure the number of GPUs in the cluster can be right sized to the AI models and the inference performance requirements.”
Benefits for hyperscalers, AI infrastructure builders, and enterprises scaling inference:
Reduced cost per token
Improved tokens per watt
Increased tokens per second, driving greater revenue per second
Experience Delos at Computex 2026
Delos Data will demonstrate a Nonstop AI live at Computex 2026 in Booth J1328, showcasing real-time inference bottleneck detection using Delos Mosaic TM and AI workload tuning in a production-scale environment. Attendees will see firsthand how Mosaic delivers actionable cluster intelligence to accelerate and stabilize inference workloads.
Availability
Early access software deployments with select customers available now; broader
availability is planned for Q4 2026.