Google Gemini 3.1 Ultra Breaks Reasoning Records with 94.3% on GPQA Diamond

Google's flagship Gemini 3.1 Ultra achieves a verified 94.3% on GPQA Diamond and 77.1% on ARC-AGI-2, setting new state-of-the-art benchmarks for complex scientific and logical reasoning.

Google released Gemini 3.1 Ultra in March 2026 as the flagship of its new three-model family, alongside Flash-Lite and Flash Live. While the broader Gemini 3.1 family had been anticipated for its multimodal improvements, it is the Ultra's reasoning benchmark performance that has drawn the most attention from the AI research community: a verified 94.3% on GPQA Diamond — a benchmark designed to stump even PhD-level domain experts — and 77.1% on ARC-AGI-2, which evaluates a model's ability to solve entirely novel logical patterns it has never encountered before.

The 3.1 Ultra improves on its predecessor in several key dimensions. It retains the 2-million-token context window that made Gemini 3 a landmark release, but adds significantly improved grounding capabilities to reduce hallucinations on factual queries. A new sandboxed Code Execution tool allows the model to write, run, and debug code mid-conversation without external plugins, bringing an integrated software engineering loop directly into the chat interface. Deep Think mode — available to Google AI Ultra subscribers — enables extended deliberative reasoning for problems that require sustained multi-step analysis.

The model operates natively across text, image, audio, and video without requiring transcription intermediaries, a capability Google calls Multimodal Mastery. This means the model processes raw audio and video frames directly, enabling more accurate comprehension of tone, visual context, and temporal relationships than systems that first convert media to text before reasoning. In early enterprise tests, this has proven especially valuable for medical imaging analysis, video content moderation, and complex document processing that mixes charts, diagrams, and prose.

Gemini 3.1 Pro is rolling out broadly to Gemini app users with AI Pro and Ultra subscriptions, and is available on NotebookLM. Developers and enterprises can access both Pro and Ultra in preview through the Gemini API via AI Studio, Vertex AI, and Gemini CLI. With Gemini now at 750 million monthly users, the 3.1 family represents Google's most aggressive push yet to close the gap with OpenAI in both consumer mindshare and enterprise deployments — a competition that has become one of the defining technology races of 2026.

Google Gemini 3.1 Ultra Breaks Reasoning Records with 94.3% on GPQA Diamond

Comments

Related Articles

Sakana AI's Fugu Turns Model Orchestration Into a Product

GLM-5.2 Review: The Open Model That Out-Codes GPT-5.5

GPT-5.6: Everything We Know So Far — Rumors, Leaks, and Where It Stands in Testing