$500 GPU outperforms Claude Sonnet on coding benchmarks
yogthos
142 points
53 comments
March 26, 2026
Related Discussions
Found 5 related stories in 52.1ms across 3,663 title embeddings via pgvector HNSW
- Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs desideratum · 179 pts · April 05, 2026 · 56% similar
- Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs dnhkng · 358 pts · March 10, 2026 · 55% similar
- Show HN: A game where you build a GPU Jaso1024 · 610 pts · April 04, 2026 · 53% similar
- Analyzing Nvidia GB10's GPU ingve · 11 pts · March 14, 2026 · 51% similar
- Intel Announces Arc Pro B70 and Arc Pro B65 GPUs throwaway270925 · 144 pts · March 26, 2026 · 51% similar
Discussion Highlights (11 comments)
memothon
I'm always skeptical because you can make it pass the benchmarks, then you use it and it is not practically useful unlike an extremely general model. Cool work though, really excited for the potential of slimming down models.
negativegate
Am I still SOL on AMD (9070 XT) when it comes to this stuff?
riidom
Not a word about the tok/sec, unfortunately.
superkuh
If anyone else was hoping this was using Q8 internally and that converted to Q4 it could fit in 12GB VRAM: unfortunately it's already at Q4_K_M (~9GB) and the the 16GB requirement is from other parts not a 14B@8bit+kv cache/etc you might guess.
selcuka
It's a race to the bottom. DeepSeek beats all others (single-shot), and it is ~50% cheaper than the cost of local electricity only. > DeepSeek V3.2 Reasoning 86.2% ~$0.002 API, single-shot > ATLAS V3 (pass@1-v(k=3)) 74.6% ~$0.004 Local electricity only, best-of-3 + repair pipeline
mmaunder
I’d encourage devs to use MiniMax, Kimi, etc for real world tasks that require intelligence. The down sides emerge pretty fast: much higher reasoning token use, slower outputs, and degradation that is palpable. Sadly, you do get what you pay for right now. However that doesn’t prevent you from saving tons through smart model routing, being smart about reasoning budgets, and using max output tokens wisely. And optimize your apps and prompts to reduce output tokens.
limoce
The title should be "Adaptive Test-time Learning and Autonomous Specialization".
emp17344
Yet more evidence that the harness matters more than the model.
0xbadcafebee
This is specifically an experiment using ablation and multiple passes to improve the end result. Other techniques have been found that do this (like multiple passes through the same layers). But this technique - for this one specific model - seems to be both more performant, but also takes much longer, and requires more complexity. It's unlikely most people would use this technique, but it's interesting.
b3ing
Will open source or local llms kill the big AI providers eventually? If so when? I can see maybe basic chat, not sure about coding and images yet
electroglyph
what's with the weird "Geometric Lens routing" ?? sounds like a made up GPTism