KV Sharing, MHC, and Compressed Attention

gmays 29 points 2 comments May 19, 2026
magazine.sebastianraschka.com · View on Hacker News

Discussion Highlights (2 comments)

nibab

cool stuff. my comp sci major feels almost completely redundant in this new vibecoding era and i feel like the only way to stay relevant as a programmer is to learn all these compute primitives and become an LLM systems guy.

redwood

Has anyone seen a similar deep dive but that looks a little bit more closely at the infrastructure building blocks that power each of the components. I mean something a bit more physically grounded like how much compute would go to each portion to serve a Frontier Model?

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed