Speculative Speculative Decoding (SSD)
E-Reverance
28 points
2 comments
March 04, 2026
Related Discussions
Found 5 related stories in 95.3ms across 8,303 title embeddings via pgvector HNSW
- How to Write to SSDs [pdf] matt_d · 79 pts · May 15, 2026 · 47% similar
- Deterministic Fully-Static Whole-Binary Translation Without Heuristics matt_d · 63 pts · May 13, 2026 · 46% similar
- DS4, a specialized inference engine for DeepSeek v4 Flash tosh · 18 pts · May 07, 2026 · 45% similar
- Introspective Diffusion Language Models zagwdt · 257 pts · April 14, 2026 · 45% similar
- A sufficiently detailed spec is code dokdev · 11 pts · March 18, 2026 · 45% similar
Discussion Highlights (2 comments)
saagarjha
Yo dawg I heard you liked speculation so we speculated your speculating
Ari_Rahikkala
Neat. Very similar to tree-based speculation as they point out, and they also point how to combine them. Speculative decoding: Sample a linear output (next n tokens) from draft model, submit it to a verifier model. At some index the verifier might reject a token and say that no, actually the next token should be this other token instead ("bonus token" in this paper), and that's your output. Or if it accepts the whole draft, you still get a bonus token as the next token past the draft. Then you draft again from that prefix on. Tree-based speculation: Sample a tree of outputs from draft model, submit whole tree to verifier, pick longest accepted prefix (and its bonus token). Speculative speculative decoding: Sample a linear output from draft model, then in parallel both verify it with the verifier model, and produce a tree of drafts branching out from different rejection points and different choices of bonus tokens at those points. When the verifier finishes, you might have have a new draft ready to submit right away. Combined: Sample a tree from the draft model, submit the whole tree to the verifier and in parallel also plan out drafts for different rejection points with different bonus tokens anywhere in the tree.