Show HN: Bonsai 1.7B ternary model at 442T/s on M4 Max

hhuytho 13 points 3 comments May 04, 2026

We took a recently released Bonsai 1.7B ternary model from PrismML ( https://github.com/PrismML-Eng/Bonsai-demo ) and ran our agentic evolution search on it for 6 hours to optimize the Metal kernels. The search was fully autonomous. Measured against unmodified upstream llama.cpp at the same Bonsai/Q2_0 commit, same M4 Max: - tg128: 309.82 → 442.42 t/s (+42.0%) - pp512: 4250.32 → 4622.63 t/s (+8.8%)

Discussion Highlights (2 comments)

dsecurity49

That performance jump is incredible. Curious to know if the evolution search found any specific optimizations that were counter-intuitive to how we normally write Metal kernels?

rpdaiml

Nice work, that throughput is wild.

Show HN: Bonsai 1.7B ternary model at 442T/s on M4 Max

Discussion Highlights (2 comments)

Related Discussions