VibeVoice: Open-source frontier voice AI
tosh
345 points
167 comments
April 28, 2026
Related Discussions
Found 5 related stories in 96.3ms across 8,303 title embeddings via pgvector HNSW
- Voice-AI-for-Beginners – A curated learning path for developers mahimai · 61 pts · May 02, 2026 · 60% similar
- Speaking of Voxtral Palmik · 18 pts · March 26, 2026 · 57% similar
- Advancing voice intelligence with new models in the API meetpateltech · 33 pts · May 07, 2026 · 57% similar
- How OpenAI delivers low-latency voice AI at scale Sean-Der · 359 pts · May 04, 2026 · 57% similar
- Mistral Medium 3.5 meetpateltech · 450 pts · April 29, 2026 · 54% similar
Discussion Highlights (20 comments)
CubsFan1060
Great post last night from Simon: https://simonwillison.net/2026/Apr/27/vibevoice/
podgietaru
So we've really just settled on Vibe as the verb for AI then?
embedding-shape
Isn't this project the one Microsoft published but then soon after pulled it for security/safety reasons? What has changed since then?
walthamstow
Seems quite heavy for a STT model, Parakeet and Whisper are much smaller and perform great for quick dictation and transcription of longer files. I guess that's due to additional accuracy and speaker diarisation? The TTS example clip in the repo of 'spontaneous singing' is creepy as fuck
steinvakt2
This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual. Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.
Anonyneko
You have selected Microsoft Sam as the computer's default voice.
Void_
I the past month or so, I added 2 models to my app Whisper Memos ( https://whispermemos.com ): - Cohere Transcribe (self hosted) - Grok Speech To Text (they provide an API, only $0.10/hr!) They are both excellent. I'm not sure about this one. Would you like to see it in a consumer speech to text app?
maxloh
I think we should stop calling this type of models open source. They are indeed "open weight." The training code is proprietary and never revealed. https://github.com/microsoft/VibeVoice/issues/102
pluc
Interesting story about this repo/product/author by cybersecurity researcher Kevin Beaumont: https://cyberplace.social/@GossiTheDog/116454846703138243
mistic92
For me its giving me very poor results
JumpCrisscross
What’s the current state of the art, for each of training locally and in the cloud, for learning my voice?
aqme28
Interesting to see "vibe" enshrined by the likes of Microsoft as an AI product word.
BlastBash192
Maybe Microsoft’s real strength was never making the best model, it was knowing you don’t need to, as long as you own the platform everyone builds on.
ryukoposting
Holy moly, a Microsoft AI product that isn't named Copilot!
frangonf
I took a look into local options for ASR and diarization some months ago, I missed that VibeVoice now has this feature. My conclusions back then (which only came from a shallow research on the topic and 0 real experience mind you) was that Whisper + Pyannote was the "stable" approach. Have the VibeVoice, Voxtral, Qwen or the Nemo solutions caught up in segmentation and speaker recognition?
khimaros
looks like this offers ASR support in GGUF https://github.com/CrispStrobe/CrispASR -- haven't tested
chaosprint
Microsoft Store App Vibing.exe Accused of Harvesting Screens, Audio, and Clipboard Data: https://cyberpress.org/microsoft-store-app-vibing-exe-accuse...
ChrisArchitect
Previously: Sept 2025 https://news.ycombinator.com/item?id=45114245
starkeeper
Microsoft is famous for choosing terrible names but how could they be this terrible.
Mobius01
Microsoft has historically made poor choices in product naming, but this has to be a new low.