Build Live Translation Apps with GPT-realtime-translate
gpt-realtime-translate is a specialized model for live speech-to-speech translation, designed to enable multilingual audio experiences in broadcasts, calls, and video conversations. It detects the source language automatically and delivers translated speech and text with low latency by streaming audio in real time. The model is optimized for interpretation, trained on professional interpreter data, and avoids common pitfalls of general-purpose voice models.
- ▪gpt-realtime-translate is trained on thousands of hours of professional interpreter audio to prioritize accurate, context-aware translation.
- ▪It supports over 70 input languages and 13 output languages, with dynamic voice adaptation that mirrors the speaker's tone and style.
- ▪The model enables low-latency, continuous translation without requiring speakers to pause, making it suitable for live broadcasts and conversational use.
- ▪Developers can integrate it into web apps, phone calls via Twilio, and video chats using LiveKit for real-time multilingual communication.
- ▪Unlike general voice models, it avoids responding to prompts and focuses solely on translation, maintaining turn-based fluency across different languages.
Opening excerpt (first ~120 words) tap to expand
gpt-realtime-translate is a live speech-to-speech translation model for building multilingual audio experiences across broadcasts, streams, calls, and video conversations. It accepts spoken input, automatically detects the source language, and returns translated speech plus text transcripts. Developers only need to specify the target output language. This model has two new features that make it uniquely capable: Unlike general-purpose voice models, gpt-realtime-translate is optimized for interpretation. It was trained on thousands of hours of professional interpreter audio, which helps it remain translation-only and wait for enough context before producing speech. This is especially important across languages with different sentence structures.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Openai.