Google Ships Gemini 3.1 Flash-Lite to GA at $0.25 per Million Input Tokens
Models·2 min read·Google Cloud Blog

Google Ships Gemini 3.1 Flash-Lite to GA at $0.25 per Million Input Tokens

Gemini 3.1 Flash-Lite is now generally available on Vertex AI and Gemini Enterprise, pitched as the cheapest, fastest member of the Gemini 3 family for high-volume agentic, classification, and tool-use workloads.

Share:

Google on Friday made Gemini 3.1 Flash-Lite generally available on Vertex AI and Gemini Enterprise, ending a roughly two-month preview and giving developers a stable, low-cost option in the Gemini 3 family aimed squarely at high-volume agentic traffic. The model is priced at $0.25 per million input tokens and $1.50 per million output tokens, undercutting most frontier-class offerings by an order of magnitude on the input side.

The pitch is latency and unit economics rather than raw intelligence. Google says Flash-Lite delivers a 2.5x faster time-to-first-answer-token versus 2.5 Flash and a 45% bump in output speed, with native support for tool calling, multimodal text-plus-image inputs, and the same agentic orchestration primitives as the larger Gemini 3 models. Customer Gladly, which uses the model across millions of weekly customer-service interactions, reports a p95 of about 1.8 seconds for full replies, sub-second p95 for classification and tool-call paths, and a 99.6% success rate under heavy concurrent load — at roughly 60% lower cost than comparable thinking-tier models.

Google leaned heavily on launch-customer testimony to make the case for production readiness. JetBrains is using Flash-Lite for the IDE AI assistant and its Junie agent, citing the "balance of high intelligence and minimal latency" needed for real-time code completion. Ramp called it "especially valuable" for high-volume, latency-sensitive features without quality regressions, while AlphaSense pointed to the price-performance curve enabling its market-intelligence pipelines to scale. Astrocade, krea.ai, and OffDeal each highlighted multimodal safety checks, prompt enhancement, and live financial-research agents respectively.

Strategically, Flash-Lite is the layer Google is using to fight on cost. Anthropic's Claude Mythos and OpenAI's GPT-5.5 sit at the top of the price stack and command a premium for reasoning work, but the bulk of agentic traffic — classifiers, tool routers, lightweight summarizers, support deflection — is volume-bound and price-sensitive. By dropping a model that hits sub-second classifier latencies for a quarter of a dollar per million input tokens, Google is making the argument that the economics of running agents at scale break in its favor before the quality conversation even starts.

Flash-Lite joins Gemini 3 Pro, Gemini 3 Flash, and the recently expanded Gemma 4 open-model line as part of a stack Google is increasingly framing as a portfolio rather than a single flagship. The GA milestone also lifts the SLA constraints that kept some enterprises in trial mode through the preview window, and it lands the same week Google brought Gemini Enterprise customers a clutch of agentic workflow upgrades — a coordinated push to get Flash-Lite under more production traffic before competitors respond with their own price cuts.

Related Articles