WeSearch

How OpenAI delivers low-latency voice AI at scale

·12 min read · 0 reactions · 0 comments · 5 views
#webrtc#voice ai#real-time communication#artificial intelligence#cloud infrastructure#OpenAI#Yi Zhang#William McDonald#Justin Uberti#Sean DuBois#Pion#ChatGPT#Realtime API
How OpenAI delivers low-latency voice AI at scale
⚡ TL;DR · AI summary

OpenAI has rearchitected its WebRTC infrastructure to support low-latency voice AI at scale, ensuring natural real-time interactions for applications like ChatGPT voice and the Realtime API. The new split relay plus transceiver architecture improves global reach, connection setup speed, and media round-trip time while maintaining standard WebRTC behavior for clients. This advancement allows AI models to process audio streams continuously, enabling more conversational and responsive voice experiences.

Key facts
Original article
Hacker News: Front Page
Read full at Hacker News: Front Page →
Opening excerpt (first ~120 words) tap to expand

May 4, 2026EngineeringHow OpenAI delivers low-latency voice AI at scaleBy Yi Zhang and William McDonald, Members of Technical StaffShareVoice AI only feels natural if conversation moves at the speed of speech. When the network gets in the way, people hear it immediately as awkward pauses, clipped interruptions, or delayed barge-in. That matters for ChatGPT voice, for developers building with the Realtime API, for agents working in interactive workflows, and for models that need to process audio while a user is still talking.At OpenAI’s scale, that translates into three concrete requirements:Global reach for more than 900 million weekly active usersFast connection setup so a user can start speaking as soon as a session beginsLow and stable media round-trip time, with low jitter and packet…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News: Front Page.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments