MiniMax M3 Lands as an Open-Weight, Million-Token Coding Model That Claims to Edge Out GPT-5.5
The Chinese lab says its new open-weight model pairs frontier coding, a 1M-token context window and native multimodality on a sparse-attention architecture that cuts long-context compute 20x — but the weights and technical report are still days away, so every number is company-reported.
Chinese AI lab MiniMax unveiled M3 on June 1, 2026, calling it the first open-weight model to combine top-tier coding performance, a one-million-token context window and native multimodality in a single system — a bundle of capabilities the company says had until now been the exclusive domain of proprietary frontier models such as Anthropic's Claude Opus 4.7, OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro.
The headline engineering claim is a new attention mechanism called MiniMax Sparse Attention, or MSA. Rather than comparing every token against every other token — the quadratic cost that makes long contexts expensive — MSA pre-filters down to the relevant key-value blocks and then processes them sequentially, batching the queries that need each block into a single contiguous memory read. MiniMax says the result is roughly one-twentieth the per-token compute at a million-token context versus its previous generation, more than 9x faster prefill, more than 15x faster decoding, and an implementation that runs over four times faster than competing open-source alternatives.
On benchmarks, MiniMax reports M3 scoring 59% on SWE-Bench Pro — ahead of GPT-5.5 and Gemini 3.1 Pro and just behind Opus 4.7 — and 83.5 on the BrowseComp web-search test, edging past Opus 4.7's 79.3. The company also published three long-horizon autonomy experiments: M3 reproduced an ICLR 2025 fine-tuning paper over about 12 hours, producing 18 commits and 23 figures for a reproduction score of 0.650; it optimized an FP8 GEMM kernel on Nvidia Hopper GPUs from a broken 7.6% hardware utilization up to 71.3% across roughly 24 hours and 147 attempts; and on PostTrainBench it trained four base models end to end, landing just behind Opus 4.7 and GPT-5.5.
The crucial caveat is that none of this can yet be checked. At launch MiniMax had released neither the weights nor a technical report, promising both within ten days on Hugging Face and GitHub, along with open-sourcing its in-house MiniMax Code agent. Token plans run from about $20 a month for roughly 1.7 billion tokens up to $120 for around 9.8 billion, with a toggleable thinking mode. Until independent engineers can reproduce the architecture and rerun the benchmarks, M3's frontier and open-weight claims remain a company commitment rather than a verified fact.
Comments
Share your thoughts. Be kind.
Loading comments…