Google's Gemini 3.5 Live Translate Delivers Real-Time Speech-to-Speech Across 70+ Languages
Google's new Gemini 3.5 Live Translate streams near real-time speech-to-speech translation in 70+ languages while preserving the speaker's tone, pace, and pitch — rolling out across the Translate app, Google Meet, AI Studio, and the Gemini Live API.
Google has released Gemini 3.5 Live Translate, an audio model that delivers near real-time, speech-to-speech translation across more than 70 languages — and, unusually, tries to keep you sounding like you. Rather than flattening everything into a robotic monotone, the model generates translated speech that preserves the speaker's intonation, pacing, and pitch, and it is already rolling out across Google Translate, Google Meet, AI Studio, and the Gemini Live API.
The hard part of live translation is the trade-off between speed and accuracy, and Google's pitch is that Gemini 3.5 Live Translate manages it gracefully. The model processes speech as it streams, continuously balancing how long to wait for context against the need to translate immediately, and stays just a few seconds behind the speaker throughout a session. It automatically detects which of the 70+ languages is being spoken — no manual setup — and is built to stay robust in loud, unpredictable real-world environments.
The most consumer-facing change lands in the Google Translate app on Android and iOS worldwide. On Android, a new "listening mode" streams the translated audio straight to your phone's earpiece, so you can hold the handset to your ear like an ordinary call and simply listen. The bigger enterprise play is in Google Meet, where live translation jumps from 5 languages to more than 70 — over 2,000 language-pair combinations — entering private preview for select Workspace customers this month before a broader rollout later this year.
For developers, the same capability is available in public preview through Google AI Studio and the Gemini Live API, with integrations already wired up for voice platforms including Agora, Fishjam, LiveKit, Pipecat, and Vision Agents — the plumbing most real-time voice apps are built on. Every audio output is watermarked with Google's SynthID, an imperceptible marker meant to keep AI-generated speech detectable as the technology gets harder to distinguish from a human voice.
The launch sharpens an increasingly crowded contest over the long-promised "universal translator." It builds on the Gemini 3.5 audio work Google showed at I/O 2026, and squares off directly against OpenAI's GPT-Realtime-2 and Translate push into voice. The difference this time is distribution: Google can drop the feature into the Translate app, Meet, and Android's earpiece all at once, reaching hundreds of millions of users without anyone installing anything new.
Comments
Share your thoughts. Be kind.
Loading comments…