GLM-5.2 is the new leading open weights model on Artificial Analysis

himata4113 831 points 401 comments June 17, 2026
artificialanalysis.ai · View on Hacker News

Discussion Highlights (20 comments)

Tiberium

It seems to really be a nice step-up and is getting quite close to the frontier. I wish they'd start focusing on the reasoning efficiency now, though. I have a simple (relatively) test task to evaluate LLMs: writing a simple math evaluator library in Nim (it's about 400-600 lines total max), and GLM 5.2 (xhigh which maps to max effort) spent over 15 minutes (!) reasoning, spending about 45k tokens, before it finally wrote the first file. I know it's hard to improve on that, but now that their models are good enough at raw intelligence, I think this should become a higher priority task. Currently on https://artificialanalysis.ai/#output-tokens GPT 5.5 xhigh spends 16k tokens total on average, GPT 5.5 high is 10k, Fable 5 33k, Opus 4.8 41k, GLM 5.2 is 42k. GPT 5.5 is extremely reasoning efficient. Of course if you convert those values to actual request cost, GLM 5.2 will probably beat GPT 5.5/Opus 4.8, but speed matters for a lot of people, I think.

Havoc

It’s pretty good. More talkative than 5.1. Reminds me of deepseek 4 Their servers are melting though - getting more timeouts etc

unrvl22

Why aren't more people talking about this? It's literally Opus 4.7 quality stupid prices. I know providers who are offering this at unlimited tokens for $50 a month. Some are even offering API rates at 3x lower than the official ZAI api rates which are already like 10x cheaper than Opus. (Crof and Umans btw) This is a huge blow to Anthropic/OpenAI/Google and a massive win for the rest of the world. The official API prices and speeds mean nothing for open source models.

nh43215rgb

> GLM-5.2 sits off the most attractive quadrant on the Intelligence vs Output Tokens chart. That is unfortunate...

CuriouslyC

I've been playing with this model a fair amount over the last 24 hours, and I can confirm it's quite capable, while being a little bit verbose (I've seen it reconsider things 3-4 times in thinking traces before deciding on a path forward), and not being quite as good as GPT5.5 at working through complex abstract requirements. Honestly it's good enough that I feel comfortable recommending a Z.AI sub + a $20/mo OpenAI sub for all but the most AI pilled multi-orchestrators, or the die hard Claude fans. GLM writing + GPT reviewing/debugging feels pretty unlimited and minimally worse than just doing everything in GPT with the $200/mo plan.

kingstnap

According to many benchmarks this model is straight up frontier level and Zai seriously cooked. Some of these numbers are incredible. Excited to see if this turns out to be a Open Weight Opus 4.5 or better.

davidwritesbugs

I like their models, super cheap - I'm a Lite plan subscriber, and subjective performance seems to be same as lower Anthropic models, useful for lots of grunt work. The problem is that Ziphu really __really__ struggle with capacity - everyone is complaining of timeouts or very slow speeds. I can't get direct access to the model though I see it is in OpenRouter so I may play. But the capacity issues means DeepSeek is my main provider these days

tensegrist

> On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05) am i missing something?

rahidz

Correct me if I'm wrong, but neither DeepSeek nor GLM have image input modality. This makes them less useful when looking at UIs, photos, screenshots, etc. doesn't it? Or do they have alternate ways of doing so?

creamyhorror

It's a real step forward, getting closer to SOTA. It seems to be very epistemically cautious in its reasoning. I hope Deepseek and the other open-weights labs stay in the game and catch up too.

xiaoyu2006

This open source model is quite near SOTA with only 700B/40B MoE. Truly efficient.

lousken

Cerebras really needs to have this on their API list (if they even still exist).

ramon156

I've made a comment before that 5.1 will sometimes get stuck looping over a simple decision or statement. It will basically contradict and then not realize that one option is the definite option. Sometimes it's two statements that aren't even exclusive. Nonetheless, a lot of tokens that get wasted from this. I haven't extensively used 5.2 yet, but it seems a lot better.

_pdp_

I am helpful. DeepSeek V4 has been quite amazing in our workloads and it operates at a fraction of the cost. I have not tried GLM 5.2 but it seems that it hits a sweet spot.

XCSme

In my tests[0] GLM-5.2 is not much better than GLM-5, and overall DeepSeek V4 Flash seems to be the better/more cost-effective choice: [0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

Pragmata

So this basically means we will have a near opus level model able to be run locally in the next couple of months right? QWEN 3.6 27b is already pretty good, but it should be possible to get a better option now that runs in the same hardware, right?

CubsFan1060

Knowing very little about how to run these, how close are we to medium or larger businesses starting to buy hardware to run models like this to keep the models local? It’s expensive, and not as capable as the frontier models, but would have some pretty big benefits around privacy and agency.

mrngld

Artificial Analysis coding benchmark shows GLM5.1 on high pretty close to GPT5.5 xhigh in cost to run, with GPT5.5 on medium significantly less expensive. Compared to GPT5.5 medium GLM5.1xhigh is twice the cost and half the intelligence. They don't have GLM5.2 on there yet, but that'd a big gap to bridge. https://artificialanalysis.ai/agents/coding-agents?coding-ag... I thought I was "holding it wrong" until DeepSWE came along -- personally it seems to match my own experiences pretty well. Really makes me wonder how legitimate some of the internet noise is about open models. There's surely some use cases for them, not everything needs the absolute frontier (GPT5.5 on low is awesome), but if you want to be near the frontier everyone needs to be honest about the fact that we're only talking about Opus, Fable, GPT5.5.

kissgyorgy

I tried it today through Openrouter and the API is atrocious. I got multiple rate limit and random errors every turn. Somebody wrote [1]; "I am never touching Minimax or GLM again. Their APIs had constant outages and I had to restart my runs multiple times — after burning money on the runs that failed midway." and I 100% agree. The model might be good, but if the API is so bad, it's effectively useless. [1]: https://kasra.blog/blog/i-spent-1500-seeing-if-llms-could-ha...

dsrtslnd23

looks like I need a GB300 workstation

Semantic search powered by Rivestack pgvector
10,813 stories · 101,683 chunks indexed