FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels
PaulHoule
17 points
1 comment
May 12, 2026
Related Discussions
Found 5 related stories in 91.2ms across 8,303 title embeddings via pgvector HNSW
- Executing programs inside transformers with exponentially faster inference u1hcw9nx · 17 pts · March 12, 2026 · 53% similar
- Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training xlayn · 90 pts · March 18, 2026 · 51% similar
- Advanced Quantization Algorithm for LLMs lastdong · 121 pts · May 01, 2026 · 50% similar
- GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU XMasterrrr · 12 pts · May 20, 2026 · 49% similar
- Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O atomicthumbs · 86 pts · May 21, 2026 · 49% similar
Discussion Highlights (1 comments)
Reubend
Paper looks great. No GitHub link that I can find though. Maybe I'll take a crack at an implementation if I've got some extra free time.