Advancing voice intelligence with new models in the API
meetpateltech
33 points
5 comments
May 07, 2026
Related Discussions
Found 5 related stories in 97.3ms across 8,303 title embeddings via pgvector HNSW
- How OpenAI delivers low-latency voice AI at scale Sean-Der · 359 pts · May 04, 2026 · 68% similar
- Voice-AI-for-Beginners – A curated learning path for developers mahimai · 61 pts · May 02, 2026 · 61% similar
- The next evolution of the Agents SDK meetpateltech · 20 pts · April 15, 2026 · 59% similar
- VibeVoice: Open-source frontier voice AI tosh · 345 pts · April 28, 2026 · 57% similar
- Gemini 3.1 Flash TTS: the next generation of expressive AI speech pretext · 16 pts · April 15, 2026 · 54% similar
Discussion Highlights (4 comments)
wild_egg
> $32 / 1M audio input tokens ($0.40 for cached input tokens) Anyone know how much audio is 1M tokens? I have no way of knowing if this is fine or prohibitively expensive.
jiehong
Looks like the GPT‑Realtime‑Whisper model isn’t open weight like the old whisper model. Too bad! However, OpenAI had and still has a true lead on voice model interactions. That’s where Chinese AI companies don’t do as well: deepseek doesn’t have anything or like Kimi that can speak out in any language except English or Chinese.
tjohnell
I’ve been doing a ragtag version of this with sub-agents, TTS, and STT in Claude Code. Real-time would be pretty awesome, even if it’s just orchestrating other agents. I’m might have to try this on top of my Claude agents. I don’t think the model doing the talking necessarily needs to do the heavy reasoning - just needs to have context on your other agents, delegate, and remain present.
andrewstuart
Fortunately there’s real competition in the voice ai field. Presumably because it’s genuinely useful - I can easily think of applications to make with a powerful voice ui.