SubQ: a sub-quadratic LLM with 12M-token context
mitchwainer
46 points
19 comments
May 05, 2026
Related Discussions
Found 5 related stories in 75.2ms across 8,303 title embeddings via pgvector HNSW
- SubQ: Sub-quadratic LLM built for 12M-token context gagan2020 · 17 pts · May 05, 2026 · 95% similar
- SubQ – a major breakthrough in LLM intelligence vanni · 20 pts · May 05, 2026 · 67% similar
- The context window has been shattered: Subquadratic debuts a 12M token window gmays · 42 pts · May 09, 2026 · 66% similar
- DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence cmrdporcupine · 146 pts · April 24, 2026 · 55% similar
- DeepSeek-V4: a million-token context that agents can use ibobev · 12 pts · April 28, 2026 · 54% similar
Discussion Highlights (10 comments)
remaximize
This is pretty remarkable. We've spent a lot of time finding workarounds for LLMs reading long docs. Now that's gone.
wilddolphin
optimizing AI in general. How cool is that?
williamimoh
Looks like long context isn’t a problem anymore
pstorm
I’m very surprised this isn’t getting more attention. Am I missing something? It seems at or above SOTA on the given benchmarks, doesn’t have context rot, is orders of magnitude faster, and uses less compute that current transformer models. I suppose it’s just an announcement and we can’t test it ourselves yet.
tuandin
if it's true then it's a breakthrough.
creamyhorror
Whether this is real or not, multiple commenters here look like astroturfers - created in the past year (or hours) with very low karma
2001zhaozhao
Assuming this is real and much better than existing linear attention methods as advertised, not launching with a technical report is a big miss. Edit: their blog post ( https://subq.ai/how-ssa-makes-long-context-practical ) does go pretty in-depth about it Edit 2: the fact that they're going straight for an end-to-end coding product on day 1 is very ambitious. Other speed/efficiency-oriented AI companies (Cerebras and Inception come to mind) still don't have a first-party coding product after years. IMO this is absolutely the right way to go if they really do have the big breakthrough they're claiming.
mohsen1
- magic.dev claimed 200M context window and it's been two years since and no real product yet. - They are admitting that this is built on top of a Chinese model[1] - They committed a huge chart crime with the Y axis of a chart comparing to Opus on their website that I can't find anymore (Too embarrassing to keep?). The delta between their score (81%) vs. Opus (87%) on SWE bench was hugely minimized - They named the company subquadratic but in parts they said O(1) linear scaling. At O(1) you could do much more than 12M tokens context window. At O(log n) even. I hope this is real but I doubt...
kovek
> The core idea is content-dependent selection. For each query, the model selects which parts of the sequence are worth attending to, and computes attention exactly over those positions. I don't know if this will help for things like understanding code, where the all relevant parts can be the file of 1000 lines that we are analyzing, and where every token is relevant in understanding recursion, loops, function calls, etc. This sounds like it would be great to do SSA before passing things along to a code model like claude code. Let me know if I misunderstood
in-silico
I wonder how different their method actually is from other sub-quadratic sparse attention methods like Reformer [1] and Routing Transformer [2]. [1]: https://arxiv.org/abs/2001.04451 [2]: https://arxiv.org/abs/2003.05997