Granite 4.1: IBM's 8B Model Matching 32B MoE
steveharing1
285 points
178 comments
April 30, 2026
Related Discussions
Found 5 related stories in 80.2ms across 8,303 title embeddings via pgvector HNSW
- IBM Granite 4.1 family of models srameshc · 19 pts · May 01, 2026 · 77% similar
- Gemma 4: Byte for byte, the most capable open models meetpateltech · 21 pts · April 02, 2026 · 56% similar
- Google releases Gemma 4 open models jeffmcjunkin · 1306 pts · April 02, 2026 · 54% similar
- Show HN: Bonsai 1.7B ternary model at 442T/s on M4 Max hhuytho · 13 pts · May 04, 2026 · 49% similar
- Flash-MoE: Running a 397B Parameter Model on a Laptop mft_ · 332 pts · March 22, 2026 · 47% similar
Discussion Highlights (20 comments)
mdp2021
Wish they also released an embedding model, in the line of their previous: compact (while good)...
RugnirViking
sounds interesting. Here's hoping they release a 32B model, thats a pretty good sweet spot for feasibility of home setups. edit: I just realised they do actually have a 30b release alongside this. Haven't tried it yet.
2ndorderthought
I test drove it yesterday. It's pretty impressive at 8b. Runs on commodity hardware quickly. Qwen3.6 35b a3b is still my local champion but I may use this for auto complete and small tasks. Granite has recent training data which is nice. If the other small models got fine tuned on recent data I don't know if I would use this at all, but that alone makes it pretty decent. The 4b they released was not good for my needs but could probably handle tool calls or something
Havoc
Interesting to see a pivot away from MoE by both IBM and mistral while the larger classes of SOTA of models all seem to be sticking to it. Quick vibe check of it- 8B @ Q6 - seems promising. Bit of a clinical tone, but can see that being useful for data processing and similar. You don't really want a LLM that spams you with emojis sometimes...
100ms
> Full stop. Why people don't edit out obvious sloppification and expect to still have readers left
cbg0
The real "sleeper" might be https://huggingface.co/ibm-granite/granite-vision-4.1-4b if the benchmarks hold up for such a small model against frontier models for table & semantic k:v extraction.
tosh
IBM announcement: https://research.ibm.com/blog/granite-4-1-ai-foundation-mode...
agunapal
If you really think about why MoE came into existence, its to save significant cost during training, I don't think there was any concrete evidence of performance gains for comparable MoE vs dense models. Over the years, I believe all the new techniques being employed in post training have made the models better.
dissahc
qwen3.5 9b outperforms granite 4.1 30b by a huge amount (32 vs 15 on artificialanalysis benchmark)... i have no idea what made the writer of this article say so many demonstrably incorrect things
robotmaxtron
"open source" show me.
dash2
Nah, I ain't reading that. If they can't be bothered to get a human to write it, it can't be that important. I'm glad for them though. Or sorry that happened.
theblazehen
> models are judged by GPT-4 An interesting choice
m3at
https://research.ibm.com/blog/granite-4-1-ai-foundation-mode... Original article on IBM research Hugging face weights: https://huggingface.co/collections/ibm-granite/granite-41-la...
pjmalandrino
Very impressive series of SLM by IBM here. I have been using it with their Chunkless RAG concept and it is fitting very well! (for curious https://github.com/scub-france/Docling-Studio ) I convinced that SLM are a real parto of solution for true integrated AI in process...
0xbadcafebee
People complain a lot about LLM-written articles, but the human comments here on HN are far worse. Mostly a bunch of people extremely proud of themselves for not reading an LLM-written article, and then a bunch of people who take it at face value and make the model seem almost useful, and one comment that actually looked at other benchmarks. Good 'ol humanity, good at.. being emotional... and not doing analysis..... The article makes some good points about model design (how different size models within a family can get similar results, how to filter out hallucination, math result reinforcement), so that's worth understanding. It's analyzing a paper, which only discussed 3 sizes of the same model family. But what the article doesn't say is, compared to other model families, Granite 4.1 8B sucks. The only benchmark it does well at compared to other models is non-hallucination and instruction following. Qwen 3.5 4B (among other models) easily outclass it on every other metric. This article teaches a valuable lesson about reading articles in general. You can take useful information away from them (yes, despite being written by LLM). But you should also use critical thinking skills and be proactive to see if the article missed anything you might find relevant.
cubefox
It's strange that they don't include reasoning training (RLVR). Their justification doesn't sound convincing: > While reasoning models have grown in popularity in recent years, their abilities aren’t always the most efficient way to get a result. In enterprise settings, token costs and speed are often as important as performance. That is why turning to less expensive, non-reasoning models with similar benchmark performance for select tasks like instruction following and tool calling makes sense for enterprise users. I guess they currently don't have the ability to do proper RLVR.
dimitrismrtzs
The 8B class closing the gap with 32B is the real story of 2026 for anyone running models locally. I've been using smaller models for agent tool-use and the progress this year is real. The gap that still matters most isn't intelligence — it's consistency on structured output. When you chain 5+ tool calls in sequence, even a small per-call reliability difference compounds fast. Would love to see Granite 4.1 benchmarked specifically on multi-step function calling rather than just general benchmarks.
woadwarrior01
The most salient thing about these models is that they're non-reasoning models. This makes then very token efficient and particularly well suited for local inference where decoding is usually slower than with datacenter GPUs. Link to HF collection: https://huggingface.co/collections/ibm-granite/granite-41-la...
smj-edison
On the topic of local models, is there a good equivalent to something like Claude's chat interface? I've recently started transitioning to open models after getting fed up with Claude's usage limits (I'm not in a position to drop $200/month), and for coding tasks Kimi 2.6 has been about the same as Sonnet in my experience. The only thing I've found myself missing is a nice interface to ask it questions and have it help me with my math assignments.
simonw
The Granite 4.1 3B model is only 2GB from Unsloth: https://huggingface.co/unsloth/granite-4.1-3b-GGUF I ran it in LM Studio and got a pleasingly abstract pelican on a bicycle (genuinely not bad for a tiny 3B model - it can at least output valid SVG): https://gist.github.com/simonw/5f2df6093885a04c9573cf5756d34...