MAI-Code-1-Flash

EvanZhouDev 437 points 189 comments June 02, 2026
microsoft.ai · View on Hacker News

https://microsoft.ai/models/mai-code-1-flash/ https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF Launching seven new MAI models: https://microsoft.ai/news/building-a-hillclimbing-machine-la...

Discussion Highlights (20 comments)

OsrsNeedsf2P

So it's trained on the SWE Bench Pro evalset

AntiRush

The introductory blog post has a lot more information https://microsoft.ai/news/introducingmai-code-1-flash/ and the model card https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF The broader announcement of 7 MAI models seems to be where the 5B active in the title comes from https://microsoft.ai/news/building-a-hillclimbing-machine-la...

onlyrealcuzzo

Gemma 4 26B-A4B scored exceptionally well with 20% less params, so this isn't unprecedented.

hootz

I'd love to see a tokens per second metric. I always prioritize speed over raw intelligence for flash models.

capten

It's so weird to me that the benchmarks remain so low, but the models are marketed as revolutionary. And if you say that low coding capabilities aren't a problem, say that to the token price hike and 'general use' model setup. Why not sell it as a math agent? Why do I have to set up 4 agents to check each others' work?

ajyoon

Scroll wheel hijacked on this entire domain

tosh

not open weight or at least I did not find anything indicating open weight

freediddy

is 51% good enough to reliably use? There's no world in which I use an AI agent where it gets even 15% of the code wrong, that's as bad a Tesla FSD where you need to pay attention to the road while engaging FSD. What's the point? My attention is what I'm trying to relieve, not mostly correct functionality. The only thing that matters is whether you can one-shot code like Claude or Codex, I'm not interested in a small but mostly-okay-but-annoyingly-buggy-every-now-and-then AI.

bguberfain

It is good to se big companies like Microsoft launching LLMs. They have large amount of compute power and good scientists to create useful models.

mattlondon

Comparing against Claude 4.5? Aren't we up to 4.8? But disingenuous?

mentos

Shouldn’t the next model focus not be on code but system design? Seems like the work from a good system design to code is practically solved. Now it’s a matter of the design of the system. Or is that represented in these evals?

LoganDark

"Clean data" is impossible. Language models have polluted the landscape to such a degree it's impossible to filter them out now. OpenAI has no doubt discarded or muddled their dataset that was used to train the original ChatGPT, so there may be no dataset in existence now that isn't contaminated.

hmokiguess

Does anyone actually uses these smaller models for coding? If so, how? I usually Opus everything. Is the play to plan/design/architect with a heavier model than delegate structured tasks to these smaller ones? Would appreciate to hear someone's opinion on having done and tested both paths.

gslepak

Would be cool if this were an open model.

striking

To be clear about the size of the model: MAI-Code-1-Flash is 137B A5B.

camelmel

Huh, according to that model card this is a 137B total parameter model. Performance doesn't seem that good: - MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro - Qwen3.6-35B-A3B = 49.5% on SWE-bench pro ( https://huggingface.co/Qwen/Qwen3.6-35B-A3B ) They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.

efields

Please test your websites in Safari. Almost all of your iOS users use it by default, and the desktop experience is pretty close to the mobile experience, so testing is easy. That scroll effect is jank city for me (yeah yeah works fine in Chrome/Edge).

jMyles

I'd really like to get back to an autocomplete flow, ideally with some shared and optimized context with the relationship with my larger agent models. But it seems like, by and large, even the faster models are now aimed at longer-running agentic flows and not sub-1s autocomplete. Or am I wrong about that?

zb3

So it's not an open model while not being much better? Meh.

mmaunder

You lost me at forced scrolling. Ugh!

Semantic search powered by Rivestack pgvector
9,294 stories · 87,504 chunks indexed