VibeVoice: Open-source frontier voice AI

tosh 345 points 167 comments April 28, 2026
github.com · View on Hacker News

Discussion Highlights (20 comments)

CubsFan1060

Great post last night from Simon: https://simonwillison.net/2026/Apr/27/vibevoice/

podgietaru

So we've really just settled on Vibe as the verb for AI then?

embedding-shape

Isn't this project the one Microsoft published but then soon after pulled it for security/safety reasons? What has changed since then?

walthamstow

Seems quite heavy for a STT model, Parakeet and Whisper are much smaller and perform great for quick dictation and transcription of longer files. I guess that's due to additional accuracy and speaker diarisation? The TTS example clip in the repo of 'spontaneous singing' is creepy as fuck

steinvakt2

This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual. Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.

Anonyneko

You have selected Microsoft Sam as the computer's default voice.

Void_

I the past month or so, I added 2 models to my app Whisper Memos ( https://whispermemos.com ): - Cohere Transcribe (self hosted) - Grok Speech To Text (they provide an API, only $0.10/hr!) They are both excellent. I'm not sure about this one. Would you like to see it in a consumer speech to text app?

maxloh

I think we should stop calling this type of models open source. They are indeed "open weight." The training code is proprietary and never revealed. https://github.com/microsoft/VibeVoice/issues/102

pluc

Interesting story about this repo/product/author by cybersecurity researcher Kevin Beaumont: https://cyberplace.social/@GossiTheDog/116454846703138243

mistic92

For me its giving me very poor results

JumpCrisscross

What’s the current state of the art, for each of training locally and in the cloud, for learning my voice?

aqme28

Interesting to see "vibe" enshrined by the likes of Microsoft as an AI product word.

BlastBash192

Maybe Microsoft’s real strength was never making the best model, it was knowing you don’t need to, as long as you own the platform everyone builds on.

ryukoposting

Holy moly, a Microsoft AI product that isn't named Copilot!

frangonf

I took a look into local options for ASR and diarization some months ago, I missed that VibeVoice now has this feature. My conclusions back then (which only came from a shallow research on the topic and 0 real experience mind you) was that Whisper + Pyannote was the "stable" approach. Have the VibeVoice, Voxtral, Qwen or the Nemo solutions caught up in segmentation and speaker recognition?

khimaros

looks like this offers ASR support in GGUF https://github.com/CrispStrobe/CrispASR -- haven't tested

chaosprint

Microsoft Store App Vibing.exe Accused of Harvesting Screens, Audio, and Clipboard Data: https://cyberpress.org/microsoft-store-app-vibing-exe-accuse...

ChrisArchitect

Previously: Sept 2025 https://news.ycombinator.com/item?id=45114245

starkeeper

Microsoft is famous for choosing terrible names but how could they be this terrible.

Mobius01

Microsoft has historically made poor choices in product naming, but this has to be a new low.

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed