Models·3 min read·JetBrains AI Blog

JetBrains Open-Sources Mellum2, a 12B Mixture-of-Experts Model Built to Be a Fast "Focal" Part, Not a Frontier Rival

JetBrains released Mellum2 under Apache 2.0 on June 2 — a 12B MoE model that activates just 2.5B parameters per token for 2x-faster inference, pitched as a fast specialized component for multi-model AI pipelines.

OPEN SOURCE · MIXTURE-OF-EXPERTS · APACHE 2.0JETBRAINS MELLUM2 · JUN 2Mellum212B total parameters · ~2.5B active per tokenMoE · 131K context · 2x faster inference6 variants · Base · Instruct · Thinking (RLVR)THE FOCAL-MODEL THESISFast, specialized parts orchestrated by frontier modelsROUTER · 8 OF 64 EXPERTS FIREOnly ~21% of parameters fire per tokenBITSMINDS.COMSource: JetBrains AI Blog · Hugging Face
Share:

JetBrains, the company behind IntelliJ IDEA, PyCharm and a suite of developer tools, has open-sourced Mellum2, a 12-billion-parameter mixture-of-experts (MoE) model released under the permissive Apache 2.0 license. Announced on June 2, the model is a deliberate bet against the bigger-is-better frame that has dominated 2026: instead of chasing a frontier rival, JetBrains built a small, fast, specialized component meant to slot into larger AI systems.

The architecture is built for speed. Mellum2 carries 12B total parameters but activates only about 2.5B per token, routing each request through a handful of its experts — reportedly 8 of 64 — with a context window of roughly 131,000 tokens. The payoff JetBrains advertises is more than 2x faster inference than similarly sized models while staying competitive on benchmarks across code, reasoning, math and science. Where the original Mellum focused narrowly on code completion, Mellum2 has been retrained on both natural language and code, making it useful for routing, summarization and intermediate reasoning steps as well as raw coding.

Six variants ship at once, giving teams a spectrum from raw weights to ready-to-use assistants: Base and Base-Pretrain, Instruct and Instruct-SFT, and Thinking and Thinking-SFT. The Thinking models are trained with reinforcement learning from verifiable rewards (RLVR) and emit an explicit reasoning trace before answering, aimed at complex debugging, agentic workflows and multi-step planning. All of the weights are available on Hugging Face, with a full technical report posted to arXiv.

JetBrains frames the launch around a thesis it calls focal models: "the future belongs to coordinated systems, not single models." In that view, a frontier model is the orchestrator, while fast, cheap, specialized models handle the high-frequency work — classifying a request, retrieving context, summarizing a file, or acting as a low-latency sub-agent. Because Mellum2 is small enough to run privately and on-premise, JetBrains is also pitching it at developers and enterprises that cannot send code to a hosted frontier API — a niche that closed coding agents like Claude Code and Codex are not built to fill.

Comments

Share your thoughts. Be kind.

0/2000

Loading comments…

Related Articles

BUILD 2026 · MICROSOFT TAKES AIM AT CLAUDE LOCKED ON MICROSOFT AI MAI-Thinking-1 FIRST IN-HOUSE REASONING MODEL 35B active · ~1T MoE · 256K ctx trained with zero distillation Claude ENTERPRISE CODING DEFAULT SWE-Bench Pro ≈ 53% — level with Opus 4.6 BITSMINDS.COM Microsoft's claims, pending independent benchmarks
Models

Microsoft Aims MAI-Thinking-1 Straight at Claude: a 35B Reasoning Model It Says Beats Sonnet 4.6 and Matches Opus 4.6 on Code

Microsoft Will Unveil Its Own GitHub Copilot Coding Model at Build — a Direct Shot at Claude Code
Models

Microsoft Will Unveil Its Own GitHub Copilot Coding Model at Build — a Direct Shot at Claude Code

FRONTIER MODEL SHOWDOWN · WHO WINS? Three labs. Three strongest models. One fight. ANTHROPIC Claude Opus 4.8 AUTONOMY OPENAI GPT-5.5 “Spud” AGENTS GOOGLE Gemini 3.1 Ultra REASONING VS VS BITSMINDS.COM BitsMinds original analysis
Models

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Ultra: The Benchmark-by-Benchmark Comparison