My Journey to a reliable and enjoyable locally hosted voice assistant (2025)
Vaslo
353 points
103 comments
March 16, 2026
Related Discussions
Found 5 related stories in 36.5ms across 3,471 title embeddings via pgvector HNSW
- Show HN: I built a sub-500ms latency voice agent from scratch nicktikhonov · 289 pts · March 02, 2026 · 54% similar
- I built an AI receptionist for a mechanic shop mooreds · 245 pts · March 23, 2026 · 47% similar
- Things I Think I Think... Preferring Local OSS LLMs zdw · 43 pts · April 02, 2026 · 46% similar
- Things I've Done with AI shepherdjerred · 80 pts · March 09, 2026 · 45% similar
- Ask HN: What's your experience self-hosting in 2026? rustcore · 20 pts · March 03, 2026 · 43% similar
Discussion Highlights (14 comments)
dewey
Their first version is most likely already 10x better than Siri. > Understands when it is in a particular area and does not ask “which light?” when there is only one light in the area, but does correctly ask when there are multiple of the device type in the given area.
yanis_t
I'm still waiting till the promise of voice AI that was showed during the OpenAI demo in 2024 turn real somehow. It's not clear to me, why there has been zero progress since then.
voidUpdate
Do people like talking to voice assistants? I've used one occasionally (mostly for timers when I'm cooking), but most of the time it would be faster for me to just do it myself, and feels much less awkward than talking to empty air, asking it to do things for me. It might be because I just really don't like making more noise than I have to (Yes, I appreciate that some people may be disabled in such a way that it makes sense to use voice assistants, eg motor problems)
gausswho
This is five months old now. Any substantial changes to the recommended setup?
hamdingers
If you're less concerned about privacy, I use Gemini 2.5 Flash for this and it's exceptionally good and fast as a HA assistant while being much cheaper than the electricity that would be needed to keep a 3090 awake. The thing that kills this for me (and they even mentioned it) is wake word detection. I have both the HA voice preview and FPH Satellite1 devices, plus have experimented with a few other options like a Raspberry Pi with a conference mic. Somehow nothing is even 50% good as my Echo devices at picking up the wake word. The assistant itself is far better, but that doesn't matter if it takes 2-3 tries to get it to listen to you. If someone solves this problem with open hardware I'll be immediately buying several.
daveoc64
I've recently purchased a couple of the Home Assistant Voice Preview Edition devices, and they leave a lot to be desired. The wake word detection isn't great, and the audio quality is abysmal (for voice responses, not music). Amazon has ruined their Alexa and Echo devices with ads and annoying nag messages. I'd really like an open alternative, but the basics are lacking right now.
tkems
One that I have been experimenting with is using analog phones (including rotary ones!) to act as the satellites. I live in an older home and have phone jacks in most of the rooms already so I only had to use a single analog telephone adapter. [0] The downside is I don't have wake word support, but it makes it more private and I don't find myself missing my smart speakers that much. At some point I would like to also support other types of calls on the phones, but for now I need to get an LLM hooked up to it. [0] https://www.home-assistant.io/voice_control/worlds-most-priv...
ljclifford
actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day. the core issue is prosody: kokoro and piper are trained on read speech, but conversational responses have shorter breath groups and different stress patterns on function words. that's why numbers, addresses, and hedged phrases sound off even when everything else works. the fix is training data composition. conversational and read speech have different prosody distributions and models don't generalize across them. for self-hosted, coqui xtts-v2 [1] is worth trying if you want more natural english output than kokoro. btw i'm lily, cofounder of rime [2]. we're solving this for business voice agents at scale, not really the personal home assistant use case, but the underlying problem is the same. [1] https://github.com/coqui-ai/TTS [2] https://rime.ai
xrd
I've been having a lot of fun using my old Mycroft AI device. Neon is the new software package. It didn't solve the issues highlighted in this thread, but it is a fun open device to hack on. I wrote a little web app that will speak in the standard voice and say things like "hey kids, I'm AI and know everything, and your dad is really cool." They love to yell at me when I do that.
kbuck
I bought a Home Assistant Voice Preview Edition to try out. It's surprisingly good, but still falls short when compared to Google Home speakers: - Wake word detection isn't as good as the Google Homes (more false positives, more false negatives - so I can't just tune sensitivity). - Mic and speakers are both of poor quality in comparison to Google Home devices. - Flow is awkward. On a Google Home device, you can say "Okay Google, turn on the lights" with no pause. On the Voice PE, you have to say "Hey Mycroft [awkward pause while you wait for the acknowledgement noise] turn on the lights" - it seems like the Google Home devices start buffering immediately after the wake word, but the Voice PE doesn't. - Voice fingerprints don't exist, so this prevents the device from figuring out that two separate people are talking, or who is talking to it. - The device has poor identification of background noise, so if you talk to it while there is a TV playing speech in the background, it will continue to listen to the speech from the TV. It will eventually transcribe everything you said + everything from the TV and get confused. (This probably folds into the voice print thing as well.) On the upside, though: - Setting it up was really easy. - All of the entities I want to control with it are already available, without needing to export them or set them up separately in Google Home. - Despite all of the above complaints, the device is probably 80-90% of what I realistically need to use it day-to-day. If they throw a better speaker and mic array in, I'd likely be comfortable replacing all of my Google Homes.
quirk
The best fix I've made to any voice-mode AI is giving it a "done" word. So it has to listen for "pineapple" before it's allowed to process what I said. Just like radio comms (over and out).
leeeeeep1012
nice i run one dictatorflow.com that i open sourced lee101/voicetype
jimmcslim
I’m keen to see if Nabu Casa release an update to the Voice Assist hardware sometime soon. Something with the same fidelity and finish of the Amazon and Google options but open would be fantastic.
Animats
Is there a locally hosted voice assistant for Android phones? One available through F-Droid, if possible.