TurboQuant: A first-principles walkthrough

kweezar 74 points 11 comments April 27, 2026

Discussion Highlights (4 comments)

linuxhansl

I am fascinated by this and similar research (RotorQuant, etc). It seem by next year we will be able to run this year's largest models on last year's hardware. :) Maybe we won't need as many data centers and as much power as we thought. Maybe we can run more powerful models locally.

amitport

TurboQuant is a restricted version of EDEN quantization (NeurIPS 21, ICML 22). It lacks the optimal scale derivations, which makes the TurboQuant variant considerably less accurate than those works. We show this thoroughly in a new note at https://arxiv.org/abs/2604.18555 . We were the first to introduce post-rotation distribution-aware quantization in 2021. This was later implemented in many fields, including federated learning, vector retrieval, databases, inference engines, and KV-cache. It would be appropriate to receive credit for this. Furthermore, it is baffling to see the name "TurboQuant" repeated in this context, considering the many works published from 2021 onwards. The blog post mentioned above essentially guides you through EDEN quantization but ultimately settles on a sub-optimal MSE-minimizing version and an unbiasing trick. This trick often costs a full bit more than DRIVE/EDEN requires to achieve the same results using the unbiasing scale shown in the original 2021 paper.

jarbus

This is incredible. Interactive demos like this make mathematics 10x more accessible

semiinfinitely

"AI vectors"

TurboQuant: A first-principles walkthrough

Discussion Highlights (4 comments)

Related Discussions