GLM 5.2 Performance Benchmarks
theanonymousone
145 points
47 comments
June 17, 2026
Related Discussions
Found 5 related stories in 151.8ms across 10,813 title embeddings via pgvector HNSW
- GLM-5.2 is the new leading open weights model on Artificial Analysis himata4113 · 831 pts · June 17, 2026 · 69% similar
- GLM-5.2: Built for Long-Horizon Tasks meetpateltech · 24 pts · June 16, 2026 · 68% similar
- GLM-5.1: Towards Long-Horizon Tasks zixuanlimit · 481 pts · April 07, 2026 · 65% similar
- GLM 5.2 Is Out aloknnikhil · 471 pts · June 13, 2026 · 65% similar
- Z.ai GLM 5.2 theanonymousone · 15 pts · June 16, 2026 · 64% similar
Discussion Highlights (12 comments)
DeathArrow
One or two more releases and they will reach Fable level.
lanycrost
It's always nice to see how open source models growing, hope we will have good performance with lower tier hardware some day.
wongarsu
It does really well on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable. I really like that benchmark because it's one of the few benchmarks that allows LLMs to elect not to answer if they are unsure and punishes them for trying to bullshit their way through the benchmark
sourcecodeplz
still quite verbose at 140m output tokens, but this is on max thinking. high should do better.
theturtletalks
I want to trust their benchmarks but when they have Muse Spark over GPT-5.5, it gives me pause.
ChrisArchitect
Some more discussion: https://news.ycombinator.com/item?id=48567759
XCSme
I also tested it[0]: quite similar to GLM 5, a few percent better, 30% faster and 50% more expensive. [0]: https://aibenchy.com/?q=glm
hemkeshr
Local models are already useful today. The next milestone is getting this level of performance onto truly affordable hardware.
gertlabs
On our multi-agent coding and reasoning evaluations, GLM 5.2 is the first model we've tested that crossed the threshold of being on par with or better than Opus 4.6 (although as usual, we have GLM 5.2 and most other Chinese models a bit below most other benchmarks with test methodologies that are more vulnerable to benchmaxxing). Data at https://gertlabs.com/rankings
fcpk
tangent question: Claude code seems to be very much loved and suggested by most major Chinese LLM using the env vars to change the server. that however means you lose a lot of anthropic tools like auto mode, running shells, monitors/crons. is there a way to get those with non anthropic plans?
ttoinou
Would be interesting to see if we can make this model smaller with REAP + unsloth dynamic quant. It might become 4x cheaper to run for similar quality output
linzhangrun
Coding Plan is completely unavailable