Google Cloud Unveils Virgo Network to Power Next-Generation AI Data Centers

Google Cloud  has introduced a new AI-era data center networking architecture designed to support the rapidly increasing scale and complexity of modern machine learning workloads. ......

Google Cloud Unveils Virgo Network to Power Next-Generation AI Data Centers

Google Cloud  has introduced a new AI-era data center networking architecture designed to support the rapidly increasing scale and complexity of modern machine learning workloads. The company says traditional network designs are reaching their limits as foundational AI models continue to grow exponentially in size and computational demand.

According to Google, the next decade of AI requires a fundamental shift in physical cloud infrastructure, particularly networking. To address this, the company has developed the Virgo Network, a megascale AI data center fabric built on a “campus-as-a-computer” philosophy and forming a core part of its AI Hypercomputer infrastructure.

Google explained that legacy network architectures are struggling to keep up with four key constraints of modern AI workloads: massive scale requirements that span multiple data centers, rapidly increasing bandwidth demands driven by model training, synchronized traffic bursts that strain network buffers, and strict low-latency requirements for real-time inference.

The company stated that “even a single ‘straggler’ node can throttle the entire cluster’s performance,” highlighting the importance of deterministic and resilient network behavior in AI training environments.

To overcome these challenges, Google is transitioning from general-purpose networking to a specialized, multi-layer architecture that separates workloads into distinct domains. These include a scale-up interconnect for tightly coupled accelerator communication, an east-west scale-out fabric for distributed training across pods, and a north-south Jupiter front-end network for storage and compute access across data centers.

This decoupled structure is designed to allow independent upgrades across network layers, reduce bottlenecks, and improve overall system resilience while supporting faster innovation cycles.

At the center of this architecture is Virgo Network, a flat, two-layer non-blocking fabric that connects up to 134,000 chips with a reported 47 petabits per second of bisectional bandwidth. The system is designed to deliver up to four times higher bandwidth per accelerator compared to previous generations while reducing latency by approximately 40%.

Google said the design enables more predictable performance for both training and inference workloads, particularly for large-scale distributed AI systems.

The company also emphasized reliability as a core design principle. Given the scale of modern AI clusters, hardware failures are inevitable, making fault isolation and rapid recovery essential. Virgo Network incorporates independent switching planes to prevent localized failures from affecting entire clusters.

In addition, Google highlighted advancements in observability and automation, including sub-millisecond telemetry, congestion detection, and automated identification of performance bottlenecks such as “stragglers” and system “hangs.” These capabilities are designed to improve mean-time-to-recovery and maximize training efficiency.

Ultimately, Google described Virgo Network as the foundational layer of its AI Hypercomputer strategy, enabling unified compute across large-scale AI systems. The company said the architecture is intended to deliver the scale, latency control, and resilience required for the emerging agentic AI era.