JetBrains Open-Sources Mellum2, a 12B Mixture-of-Experts Model Built to Be a Fast "Focal" Part, Not a Frontier Rival
JetBrains released Mellum2 under Apache 2.0 on June 2 — a 12B MoE model that activates just 2.5B parameters per token for 2x-faster inference, pitched as a fast specialized component for multi-model AI pipelines.
JetBrains, the company behind IntelliJ IDEA, PyCharm and a suite of developer tools, has open-sourced Mellum2, a 12-billion-parameter mixture-of-experts (MoE) model released under the permissive Apache 2.0 license. Announced on June 2, the model is a deliberate bet against the bigger-is-better frame that has dominated 2026: instead of chasing a frontier rival, JetBrains built a small, fast, specialized component meant to slot into larger AI systems.
The architecture is built for speed. Mellum2 carries 12B total parameters but activates only about 2.5B per token, routing each request through a handful of its experts — reportedly 8 of 64 — with a context window of roughly 131,000 tokens. The payoff JetBrains advertises is more than 2x faster inference than similarly sized models while staying competitive on benchmarks across code, reasoning, math and science. Where the original Mellum focused narrowly on code completion, Mellum2 has been retrained on both natural language and code, making it useful for routing, summarization and intermediate reasoning steps as well as raw coding.
Six variants ship at once, giving teams a spectrum from raw weights to ready-to-use assistants: Base and Base-Pretrain, Instruct and Instruct-SFT, and Thinking and Thinking-SFT. The Thinking models are trained with reinforcement learning from verifiable rewards (RLVR) and emit an explicit reasoning trace before answering, aimed at complex debugging, agentic workflows and multi-step planning. All of the weights are available on Hugging Face, with a full technical report posted to arXiv.
JetBrains frames the launch around a thesis it calls focal models: "the future belongs to coordinated systems, not single models." In that view, a frontier model is the orchestrator, while fast, cheap, specialized models handle the high-frequency work — classifying a request, retrieving context, summarizing a file, or acting as a low-latency sub-agent. Because Mellum2 is small enough to run privately and on-premise, JetBrains is also pitching it at developers and enterprises that cannot send code to a hosted frontier API — a niche that closed coding agents like Claude Code and Codex are not built to fill.
Comments
Share your thoughts. Be kind.
Loading comments…