TurboQuant: A first-principles walkthrough
kweezar
74 points
11 comments
April 27, 2026
Related Discussions
Found 5 related stories in 85.2ms across 8,303 title embeddings via pgvector HNSW
- Show HN: TurboQuant-WASM – Google's vector quantization in the browser teamchong · 148 pts · April 04, 2026 · 62% similar
- TurboQuant: Building a Sub-Byte KV Cache Quantizer from Paper to Production wizzense · 13 pts · March 27, 2026 · 57% similar
- TurboQuant: Redefining AI efficiency with extreme compression ray__ · 509 pts · March 25, 2026 · 56% similar
- Show HN: Prompt-to-Excalidraw demo with Gemma 4 E2B in the browser (3.1GB) teamchong · 103 pts · April 19, 2026 · 48% similar
- TorQ: Kdb+ Production Framework tosh · 30 pts · May 22, 2026 · 47% similar
Discussion Highlights (4 comments)
linuxhansl
I am fascinated by this and similar research (RotorQuant, etc). It seem by next year we will be able to run this year's largest models on last year's hardware. :) Maybe we won't need as many data centers and as much power as we thought. Maybe we can run more powerful models locally.
amitport
TurboQuant is a restricted version of EDEN quantization (NeurIPS 21, ICML 22). It lacks the optimal scale derivations, which makes the TurboQuant variant considerably less accurate than those works. We show this thoroughly in a new note at https://arxiv.org/abs/2604.18555 . We were the first to introduce post-rotation distribution-aware quantization in 2021. This was later implemented in many fields, including federated learning, vector retrieval, databases, inference engines, and KV-cache. It would be appropriate to receive credit for this. Furthermore, it is baffling to see the name "TurboQuant" repeated in this context, considering the many works published from 2021 onwards. The blog post mentioned above essentially guides you through EDEN quantization but ultimately settles on a sub-optimal MSE-minimizing version and an unbiasing trick. This trick often costs a full bit more than DRIVE/EDEN requires to achieve the same results using the unbiasing scale shown in the original 2021 paper.
jarbus
This is incredible. Interactive demos like this make mathematics 10x more accessible
semiinfinitely
"AI vectors"