VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO
timhigins
97 points
31 comments
June 23, 2026
Related Discussions
Found 5 related stories in 115.4ms across 11,301 title embeddings via pgvector HNSW
- Vibe-Coded Ext4 for OpenBSD corbet · 65 pts · March 27, 2026 · 50% similar
- Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model tosh · 13 pts · March 05, 2026 · 49% similar
- Cross-Model Void Convergence: GPT-5.2 and Claude Opus 4.6 Deterministic Silence rayanpal_ · 50 pts · March 22, 2026 · 48% similar
- The VibeSec Reckoning HieronymusBosch · 61 pts · May 27, 2026 · 48% similar
- Stable Audio 3 guardienaveugle · 92 pts · May 20, 2026 · 48% similar
Discussion Highlights (7 comments)
aero2146
I tried generating the classic pelican svg, but it failed horribly just showing me a rectangle and a black circle...
noperator
Having some success while testing this model out as a replacement for GPT-5 nano in source code security review. Running on RTX 3090 (24 GB VRAM) via vLLM. It's not great on structured output (as noted in the model card) but I'm working around that in my harness.
gslepak
Note that these are Python-only results, the model will not do as well with other languages. I'm glad to see more domain-focused SLMs, we need more of them! A programming focused MoE should work well across many languages.
deftio
There is some base level of intelligence any model needs to be useful, even in narrow tasks. Could you teach a 5 year old to drive a car? A 10 year old? A 12 year old? To drive a car requires being able to read, to have judgement about ice or rainy conditions, to anticipate a child running after a ball. By the time a human in in their mid teens they have acquired the base knowledge... Small models need to have enough base knowledge to be able to be good enough -- even in a seemingly narrow regime. Where is that? Obviously they don't need all the obscure knowledge of a frontier model but there is some base level which is probably more than it would first seem.
SwellJoe
It's terrible at hunting security bugs (I expected it to be, but I wanted to be sure). I added it to a benchmark I made with a corpus of some Mythos-discovered bugs, and it found zero. The smallest pretty successful models remain Qwen 3.6 and Gemma 4 (but I haven't tested the very small variants of those yet). https://swelljoe.com/post/will-it-mythos/
secretslol
Am I right in thinking this is a tiny model which has been trained well to reason, and that's it? Makes me think of a smart person who doesn't know anything about a given topic, but with the right tools will go and research the heck out of it. I really like the sound of this... why have models train on learning anything when you can just train them how to learn and let them get on with it from something as small as a Pi Zero and an internet connection.
NotSuspicious
The interesting thing about models this small is they should be able to be put on a single Taalas chip (the HC1 already runs a Llama 3.1 8B model). We're already at the point where half-decent reasoning could be run on an ASIC (and at mind-boggling speeds).