GLM-5.2
GLM-5.2
Codefreemium
4.6

GLM-5.2 Review (June 2026): Open Weights That Out-Code GPT-5.5

GLM-5.2 is the strongest open-weight coding model yet: MIT-licensed, 744B MoE, 1M context. It beats GPT-5.5 on SWE-bench Pro and trails Opus 4.8 by a point on FrontierSWE — at a fraction of the cost. Our review, with benchmark charts.

Pros

  • Strongest open-weight coding model
  • MIT-licensed and self-hostable
  • 1M-token context fits whole repos
  • Beats GPT-5.5 on SWE-bench Pro
  • Roughly one-sixth the cost of GPT-5.5

Cons

  • Falls behind on agentic Tool-Decathlon
  • Hosted API routes data through China
  • No benchmarks published at launch
  • Tool-use still trails the closed frontier

Z.ai (formerly Zhipu AI) has quietly shipped what its newly published numbers say is the strongest open-weight coding model available today. GLM-5.2 is a 744-billion-parameter Mixture-of-Experts model — roughly 40 billion parameters active per token across 384 experts — with a 1-million-token context window, released under a permissive MIT license you can self-host. The twist: at the June 13 subscription launch Zhipu published zero benchmark numbers. The official scoreboard only arrived days later, alongside the open-weight drop. Now that the table is public, here is how GLM-5.2 actually performs.

Architecture and efficiency are half the story. The MoE design keeps inference cheap for a model this large, and Z.ai pairs it with an "IndexShare" attention scheme it says cuts compute per token by about 2.9× at the full 1M-token context, plus speculative decoding that accepts roughly 20% more predicted tokens on average. Maximum output runs to 131,072 tokens. The result is a frontier-adjacent model designed to be runnable, not just impressive on paper.

GLM-5.2 — benchmark profile (official scores) higher is better · scale 0–100 AIME 202699.2GPQA Diamond91.2Terminal-Bench 2.181.0MCP-Atlas (tools)76.8FrontierSWE74.4SWE-bench Pro62.1HLE (with tools)54.7 0255075100
GLM-5.2's own published scores across seven benchmarks (0–100, higher is better).

The coding results are where GLM-5.2 earns its headline. On SWE-bench Pro it scores 62.1, edging out OpenAI's GPT-5.5 (58.6) and clearly beating its own predecessor GLM-5.1 (58.4). On Terminal-Bench 2.1 it hits 81.0 (82.7 with the best harness), and on the long-horizon FrontierSWE benchmark it lands at 74.4 — within a single point of Anthropic's Claude Opus 4.8 (75.4). For an openly licensed model, trading blows with the closed frontier on multi-step coding is the genuinely new development.

GLM-5.2 vs the frontier — coding benchmarks 0255075100 62.1GLM-5.258.6GPT-5.558.4GLM-5.1 74.4GLM-5.275.4Opus 4.8~73.0GPT-5.5 SWE-bench Pro FrontierSWE ~ = approximate (reported as “slightly ahead/behind”)
Head-to-head on two coding benchmarks. GLM-5.2 leads on SWE-bench Pro and sits a point behind Opus 4.8 on FrontierSWE. "~" marks an approximate figure.

Reasoning is strong too: GLM-5.2 posts 99.2 on AIME 2026 and 91.2 on GPQA Diamond. Agentic tool use is the most uneven area. On MCP-Atlas it scores 76.8, nearly tying Opus 4.8 — but on the harder Tool-Decathlon it falls well behind both Opus 4.8 and GPT-5.5, and it scores 54.7 on Humanity's Last Exam with tools. If your workload is long, tool-heavy agent chains, the closed leaders still have a real edge.

It is worth dwelling on how these numbers arrived. Launching a frontier-class model with no benchmarks at all, then publishing a table days later that happens to lead on the metrics it chose to show, is the kind of move that invites skepticism — and it should. The reassuring part is that early third-party testing has broadly tracked Z.ai's published coding figures rather than contradicting them, and because the weights are open under MIT, anyone can rerun the suites themselves rather than taking the vendor's word for it. That verifiability is itself part of the pitch.

Context matters for placing GLM-5.2 in the field. The open-weight landscape is crowded — DeepSeek's V4 series, Alibaba's Qwen line and Meta's releases all compete for the same self-hosting audience — but most of those models lead on raw knowledge or price rather than agentic coding specifically. GLM-5.2's table stakes its claim on exactly that axis: long-horizon, tool-using software work. For teams that had quietly concluded serious coding still required a closed API, it is the first open option that doesn't read as a compromise on the core task.

Getting access is straightforward, with three on-ramps. The GLM Coding Plan subscription runs $10–$80 per month and wires the model into popular coding agents; there is a standard pay-as-you-go API; or you can download the MIT-licensed weights and self-host with no per-token bill at all. The 1-million-token context is the practical differentiator here — it is large enough to drop an entire mid-size repository into a single prompt instead of chunking and re-summarizing, which is precisely where strong long-horizon coding scores translate into real workflow wins.

Then there is price. VentureBeat pegs GLM-5.2 at roughly one-sixth the cost of GPT-5.5 on long-horizon coding runs, and the MIT weights mean you can run it on your own hardware with no per-token bill at all. The asterisk for enterprises: routing requests through Z.ai's hosted API sends data to China, a governance consideration that on-prem self-hosting sidesteps.

Verdict

GLM-5.2 is now the open benchmark to beat. If you want a self-hostable, MIT-licensed model that out-codes GPT-5.5 on SWE-bench Pro and shadows Claude Opus 4.8 on long-horizon work — at a fraction of the cost — this is the most compelling open release of 2026 so far. Temper expectations on the most demanding agentic tool-use tasks, verify the headline numbers yourself now that the weights are public, and weigh where your data lives if you lean on the hosted API rather than self-hosting.