GLM 5.2 vs. Opus

ritzaco 499 points 325 comments June 22, 2026
techstackups.com · View on Hacker News

Discussion Highlights (20 comments)

meander_water

> So we ran it head-to-head against Claude Opus 4.8: same one-shot prompt, build a 3D platformer in raw WebGL from scratch Running a single one-shot prompt is not a benchmark, not is it representative of any sort of real-world usage. Most agent usage is collaborative so you need to test things like reliability (when I delegate a task, does it complete it without making up test results for e.g.) and steerability (does it obey my instructions or does it just do what it thinks is best).

greyman

>On output tokens, GLM-5.2 is less than a fifth the price of Opus. Opus is most expensive model in pay as you go model, but IMO fair comparison should include subscription price as well. For example when one has $100 Claude Max and use it up through the month, it might not be more expensive than GLM, or at least not 5x.

xlii

I've been checking out GLM 5.2 on some projects and few thoughts on it: - it takes it sweet time to get code rolling, not the fastest model by any means - it strays a lot during discovery/planning but then corrects - it's not steering friendly, as it hallucinates things that it doesn't follow later on - its output is quite good A sample use case: I was optimizing rendering on Swift+Zig codebase. It chocked on 5k data entries. GLM 5.2 spent 20 minutes building the benchmarks and getting data out, which made me frustrated so I blocked non-editing tool access and went AFK, after approx. 30 minutes I found that it used already-made benchmarks and some "conclusions" to optimize 3 choke points. Output pointed that it couldn't validate suspicions and asked for more data. Implementation worked well, it was idiomatic and non-intrusive. I would even say that it was more idiomatic than GPT 5.5 effects on same repo. I would opt in in using it more BUT GPT usually completes same requests 5x faster. GLM 5.2 was spark for preparing and running inside isolated containers with JJ workspaces (so that multiple can be ran in parallel).

jkwang

GLM-5.2 is quietly becoming the most interesting open model release this year. The coding benchmarks are surprisingly close to frontier models at a fraction of the inference cost.

joshrw

Chinese models optimize for benchmarks and do poorly in real-world tasks

IronWolve

Having issues with coding a render for good looking realistic smoke coming off burning incense, opus 4.8 & gpt-5.5 both have code issues, glm-5.2 did it. Amazing. The real time 3d fluid dynamics appear to be the tricky part, I wish I still had opus access, would love to see if it can do it.

cultofmetatron

I seriously dont' know all this big hullabaloo about one shot prompting. by definition, a single prompt wont' constitute the complexity of a software project. ergo, what you'll get is a series of assumptions made by the model based on preexisting code in its training corpus. I'd rather see a coding agent that can follow steps in a plan file to a T while following guardrails and adhering to the proper coding conventions in the human reviewed spec. Id rather see performance in agent loops against human defined objectives where it can be verified to stick to defined guardrails and continue without drift till its objectives are complete. I'd also like to see it identify bugs and potential performance increases by identifying existing code and suggesting refactors based on context it can pickup about the particular use case you are trying to create. These are way more valuable metrics than "hey build X"

linzhangrun

Just that their Coding Plan is too hard to get. I've been trying to grab it for a week and still can't get it

Aozora7

I used GLM 5.0/5.1/5.2 for some projects, and for me, the area in which they lag behind frontier models the most are user interfaces. They get really close to Opus when it comes to pure algorithms, but when I need something like web application or a mobile app that looks and works well, they are very noticeably worse than even Sonnet.

david_shi

> GLM-5.2 cost a fraction as much. Opus finished in half the time and shipped a cleaner game. Off topic, but does anyone else instantly pick up on LLMisms like this? It seems like all the models have converged on this style of writing, and improvements aren't really changing it.

leumon

I've seen glm 5.2 struggle writing simple compilable c code. It might be good at web, but it's world knowledge is limited due to the small model size, making it's use quite limited in my opinion.

speedgoose

While this is interesting, one single sample with different coding harness is not very scientific.

ulrikrasmussen

> Through an API it costs a fraction of Opus, and you can run it yourself for free if you have the hardware. I haven't been keeping up on hardware costs for state of the art LLM inference, but this remark made me ask myself how many readers of the article would actually be able to run this model on hardware they own. How much would it cost to acquire such a setup?

zkmon

Cost difference matters most as cost optimization is the whole point of AI. Time difference (30 min vs 1 hr) is not a deal-breaker. The small precision gap on the first iteration does not matter for 99% of the work that happens in real world.

TurdF3rguson

Pretty clearly it's beating Opus at [web dev]( https://www.gptbased.com/ ) - on price, on score.. I mean what else is there?

msejas

Seeing the results I don't see how the results are even comparable Opus is clearly far superior in most aspects. Smoothness, design, functionality etc. At the end of the day, the time earned is more important then the cost for big players. The ability to spawn 10 claude agents and rush a project to outcompete someone is more important for big businesses in my imo. Also the small details that GLM missed would take significant more time to iron out, considering it already took double the time. I do hope other (open weight) models catch up, but to act like they are anywhere close for me is a bit disingenuous.

close2

I wonder how much tokens and time where used for the verifying part. Maybe GLM 5.2 instantly found the "solution" to read the screen pixel by pixel, but it could also have been a major token and time consumer.

jofzar

Great article, My only, I guess feedback, is that it's not really clear about the price. Would the 21.92 be the API pricing I guess? Cost $5.39 (real billed) ~$21.92 (estimate, list pricing)

postatic

I've signed up with Ollama to experiment with these open source models. For the past 3 months, it's just been experimenting, trying it out. GLM is the first model that I am using on a daily basis to do my coding work (as well as using Claude). It's good - I've been maxing out my Ollama usage limits everyday :)

_pdp_

In the name of science we crafted an autonomous AI agent that builds games on a loop. It is based on GLM 5.2. I am not sure where this is going to lead us but it is fun to watch.

Semantic search powered by Rivestack pgvector
11,301 stories · 106,340 chunks indexed