How OpenAI delivers low-latency voice AI at scale

May 4, 2026 · 7:42 PM UTC ·12 min read · 0 reactions · 0 comments · 5 views

#webrtc #voice ai #real-time communication #artificial intelligence #cloud infrastructure #OpenAI #Yi Zhang #William McDonald #Justin Uberti #Sean DuBois #Pion #ChatGPT #Realtime API

How OpenAI delivers low-latency voice AI at scale

⚡ TL;DR · AI summary

OpenAI has rearchitected its WebRTC infrastructure to support low-latency voice AI at scale, ensuring natural real-time interactions for applications like ChatGPT voice and the Realtime API. The new split relay plus transceiver architecture improves global reach, connection setup speed, and media round-trip time while maintaining standard WebRTC behavior for clients. This advancement allows AI models to process audio streams continuously, enabling more conversational and responsive voice experiences.

Key facts

▪OpenAI's voice AI infrastructure serves over 900 million weekly active users with low-latency requirements.
▪The rearchitected WebRTC stack uses a split relay plus transceiver design to improve scalability and performance.
▪WebRTC standardizes connectivity, encryption, and codec negotiation, enabling consistent real-time media handling across platforms.
▪The new architecture addresses limitations in one-port-per-session media termination, stateful session management, and global routing latency.
▪OpenAI leverages contributions from WebRTC pioneers Justin Uberti and Sean DuBois, who are now part of the team.

Original article

Hacker News: Front Page

Read full at Hacker News: Front Page →

Opening excerpt (first ~120 words) tap to expand

May 4, 2026EngineeringHow OpenAI delivers low-latency voice AI at scaleBy Yi Zhang and William McDonald, Members of Technical StaffShareVoice AI only feels natural if conversation moves at the speed of speech. When the network gets in the way, people hear it immediately as awkward pauses, clipped interruptions, or delayed barge-in. That matters for ChatGPT voice, for developers building with the Realtime API, for agents working in interactive workflows, and for models that need to process audio while a user is still talking.At OpenAI’s scale, that translates into three concrete requirements:Global reach for more than 900 million weekly active usersFast connection setup so a user can start speaking as soon as a session beginsLow and stable media round-trip time, with low jitter and packet…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News: Front Page.

Anonymous · no account needed

Discussion

0 comments

How OpenAI delivers low-latency voice AI at scale

Discussion

More from Hacker News: Front Page