Fable 5 pushed Gemma 4 to 255 tok/s on WebGPU
kirubakaran
45 points
21 comments
June 18, 2026
Related Discussions
Found 5 related stories in 114.3ms across 10,996 title embeddings via pgvector HNSW
- Gemma 4 E2B running in-browser at 255 tok/s victormustar · 14 pts · June 17, 2026 · 74% similar
- Fable 5 Ported the Ladybird Browser to WebAssembly in One Shot and It Cost $552 ent101 · 22 pts · June 11, 2026 · 58% similar
- I ran Gemma 4 as a local model in Codex CLI dvaughan · 25 pts · April 12, 2026 · 52% similar
- CPUs Aren't Dead. Gemma2B Out Scored GPT-3.5 Turbo on Test That Made It Famous fredmendoza · 95 pts · April 15, 2026 · 52% similar
- Accelerating Gemma 4: faster inference with multi-token prediction drafters amrrs · 521 pts · May 05, 2026 · 51% similar
Discussion Highlights (6 comments)
freedomben
More of a meta comment, but I really wish anthropic would say something about their plans for Fable. We're all just kind of left here floating and aimless, with no idea of what to expect
nmfisher
It's not immediately clear, but this seems to be 250 tok/s on an M4 Max. For comparison, the current agent swarm challenge on HF is at 508 tok/s on a A10G GPU: https://huggingface.co/spaces/gemma-challenge/gemma-dashboar...
mike_hearn
That's very impressive. What's the best way to run these kernels natively on a Mac? I saw that there's a way to plug Claude into Apple's Foundation Models framework, and there's a CLI tool that can access models via that framework. It might be useful to have something so fast and good available via a small CLI tool for various purposes, especially when connected with a small suite of tools I have for things like file editing, showing, simple agentic purposes etc.
LoganDark
I miss Fable. It worked so well -- it was so confident and would actually make decisions on its own that I agreed with. Opus 4.8 feels so dumb now.
scotty79
> It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible. > Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s. Wow. Limitnig access to models for other reasons than that you can't physically provide it should be a crime against humanity or the planet or something. So much immediate efficency left on the table for stupid reasons.
exabrial
apologies for a dumb question, is this someone running fable5 on their own machine and it pushed to 255 tok/s? How is that possible (how did a person acquire the model?)