Grok Build 0.1: Intelligence, Performance and Price Analysis

himata4113 16 points 18 comments June 24, 2026

artificialanalysis.ai · View on Hacker News

Discussion Highlights (6 comments)

himata4113

I think this is more of a reflection that chinese labs are full of extremely capable people and it's not something that is trivial to replicate.

sourcegrift

As much as I hate to say this, I'm afraid that Musk is abusing his capital power and Grok Build will end up as the top model eventually. I tried it on a couple of projects and was (pleasantly?) surprised how good it was. I have no doubt in my mind that spacex needs to be broken up before they grow too big because grok build is going to be the future coding model of choice

nolok

Ultimately I've never used Grok and don't want to support the company / CEO behind it, but even without that the 256k context window as given in this link means I won't look at it. Model intelligence is great, but more often than not I found my issue is not that I would like 5/10% more intelligence, I'm already not even using the highest mode of thinking on most of my queries, and very important queries are better served by asking different model and ask another one to compare and help me decide. My "big" issues are mostly centered about context loss and how they explode in flight when that happens sometimes without you or them noticing. I don't know, depends on everyone's usage I guess, and 256k is not THAT shabby, but that's how I feel about it.

vessenes

I don’t know the methodology at this site, but I noted with interest they rate grok build slightly below Meta’s offering(!) I would anticipate we will see aggressive improvements / new models at least as the Cursor integration comes together, and a hard push on price and quality over the next few months. Elon is definitely eyeing revenues from the coding LLM market, and has a lot of structural power, since the competitors are running on his hardware right now.

recsv-heredoc

The harness is really important. It matters so much - possibly even more than the model. We had harness crashes after running many agents - granted we were doing quite a bit with it. Grok Build (as a product) review here: UX friction behind is worse than Claude Code - but seems to be a strange positioning choice - they're more on the 'vibe' side than the 'agentic engineering' things. Largest issue was actually reviewing output - but if you're going to largely make that opaque from the user, why choose a CLI-based interface that's so mouse-heavy? There's also problems with the actual model. Thinking is visible, and every interaction goes like this: "I would like you to investigate adding an API route to tackle x,y,z" *Grok, thinking: Okay - the user has asked me to add an API route to tackle x,y,z" Also absolutely absurd other quirks - "I have no tools available in my context" being visible in the CoT. The auto-approval (yellow, auto-mode) review of Claude Code via Opus is a killer feature - every build-it CLI should be offering this for long horizon tasks. Messaged one of the engineers about our experience - no feedback. You'd be better off with Claude Code 5x Max than the 300 USD/month subscription.

grim_io

Since their very brief rise to the top with grok 3, they have been firing their talent and having their execs leave. The result being that their models are falling behind further every day. Now they are behind Chinese open weight models. The allegedly most valuable company can't even utilize their earthly data centers, so they have to rent them out to people who can. Embarrassing.

Grok Build 0.1: Intelligence, Performance and Price Analysis

Discussion Highlights (6 comments)

Related Discussions