Executing programs inside transformers with exponentially faster inference
u1hcw9nx
17 points
3 comments
March 12, 2026
Related Discussions
Found 5 related stories in 102.8ms across 8,303 title embeddings via pgvector HNSW
- Making LLM Training Faster with Unsloth and NVIDIA segmenta · 114 pts · May 07, 2026 · 57% similar
- TurboQuant: Redefining AI efficiency with extreme compression ray__ · 509 pts · March 25, 2026 · 56% similar
- Transformers Are Inherently Succinct (2025) bearseascape · 45 pts · May 04, 2026 · 55% similar
- Custom programming languages make agents good matsur · 17 pts · March 12, 2026 · 54% similar
- Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x gmays · 16 pts · March 27, 2026 · 54% similar
Discussion Highlights (3 comments)
andy12_
This seems a really interesting path for interpretability, specially if a big chunk of a model's behavior occurs pseudo-symbolically. This is an idea I had thought about, integrating tools into the main computation path of a model, but I never imagined that it could be done efficiently with just a vanilla transformer. Truly, attention is all you need (I guess).
galsapir
one of the most interesting pieces I've read recently. Not sure I agree with all the statements there (e.g. without execution the system has no comprehension) - but extremely cool
pennomi
It makes sense that a next token predictor could execute assembly code. This is fascinating work, especially with the memory implementation.