GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

laxmena 37 points 13 comments June 16, 2026

Discussion Highlights (3 comments)

amelius

See also: https://rits.shanghai.nyu.edu/ai/karpathys-microgpt-on-fpga-... TL;DR: The CPU implementation was 71x faster than the FPGA. Note: model has only 4192 parameters.

genxy

The context window is 16 characters . Talking about tokens per second is meaningless.

cadamsdotcom

Transformers scale poorly vs. context window size and parameter count. Which means really impressive when those N’s are small! I’m but a pundit in this area so don’t know much. But one wonders if there’s a future in burning larger models to FPGAs - whether big enough FPGAs exist (or can be built), and whether locating specialized compute right with the memory it needs can speed things up. Likely would need a lot of algorithm parallelism work that’d translate back to CPUs/GPUs.

GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

Discussion Highlights (3 comments)

Related Discussions