My accent costs me 30 IQ points on Zoom. So we built an ML model to fix it
artavazdsm
33 points
23 comments
March 03, 2026
Related Discussions
Found 5 related stories in 48.9ms across 3,471 title embeddings via pgvector HNSW
- Top AI models underperform in languages other than English Brajeshwar · 19 pts · March 19, 2026 · 45% similar
- Show HN: I built a sub-500ms latency voice agent from scratch nicktikhonov · 289 pts · March 02, 2026 · 44% similar
- Speaking of Voxtral Palmik · 18 pts · March 26, 2026 · 43% similar
- Gemini 3.1 Flash Live: Making audio AI more natural and reliable meetpateltech · 12 pts · March 26, 2026 · 42% similar
- Chrome extension adjusts video speed based on how fast the speaker is talking MrBuddyCasino · 29 pts · March 13, 2026 · 40% similar
Discussion Highlights (20 comments)
artavazdsm
Co-founder of Krisp here. 1.5B non-native English speakers in the workforce, 4x native — yet all comms infra is optimized for native accents. We spent 3 years building listener-side, on-device accent understanding. The hard parts: no parallel training data exists, the accent space is infinite, accent is entangled with voice identity, and it runs on CPU under 250ms latency. Built in Yerevan, Armenia. Beta is live and free. Happy to go deep on the ML side.
astipili
will it help the barista in Starbucks get my name right finally?
snek26
Curious whether wav2vec-style embeddings played a role in your representation learning.
bebelovejan
I would like to use such model but only if it really preserves my voice, otherwise people would understand its not me or I have to use it all the time.
imuradyan
On-device CPU inference is the real flex here! Optimization probably mattered as much as modeling.
sssnowgirl
This is a game-changer! I remember each and every call I had with an investor and feeling shy asking "can you repeat?"... thanks krisp, you changed my life!!!
gyumjibashyan
How did you estimate the number of IQ points?
arshakarap
This is built for international, privacy-first teams!
armsuro
This feels adjacent to voice conversion research, but with stricter latency constraints.
amartiro
The parallel data is a problem here — you can’t crowdsource ground truth because no one can record themselves with a different accent.
aharutyunyan
Accent space is effectively infinite. Generalization must rely on invariants rather than enumeration.
nareksardaryann
Great work. Natural + clear is the combo that matters.
rasjonell
Latency can destroy conversational rhythm. What’s your p95 inference time? also are there any benchmarks we can see?
tritont
Nice to finally see this direction of accent conversion (that is on incoming calls) in the Krisp app. This is a very meaningful feature.
Ani_Kh1
Curious whether wav2vec-style embeddings played a role in your representation learning.
MarAraqelyan
Really cool to see accent adaptation in real time — curious about benchmarks and how well this handles messy, real Zoom calls
achobanyan
Local CPU inference stands out. Careful optimization likely rivaled the modeling effort.
zkhalapyan
Yeh, this would be helpful for the Singlish friends of mine out there!
1ilit
On-device CPU inference is the real flex here. Optimization probably mattered as much as modeling.
Narek21
This feels adjacent to voice conversion research, but with stricter latency constraints.