How OpenAI delivers low-latency voice AI at scale
Summary
OpenAI rearchitected its WebRTC (web real-time communication, a standard protocol for sending low-latency audio and video between clients and servers) infrastructure to handle voice AI at scale while maintaining natural conversation speed. The team addressed three constraints that conflicted at scale: one-port-per-session media termination, stateful ICE (Interactive Connectivity Establishment, the process for establishing connections across firewalls) and DTLS (Datagram Transport Layer Security, encryption for real-time data) session stability, and global routing latency. OpenAI built a new split relay plus transceiver architecture that preserves standard WebRTC behavior for users while changing how data packets are routed internally.
Classification
Affected Vendors
Related Issues
Original source: https://openai.com/index/delivering-low-latency-voice-ai-at-scale
First tracked: May 4, 2026 at 08:00 PM
Classified by LLM (prompt v3) · confidence: 85%