Google Unveils Gemini 3.5 Live Translate

Editor J
Google Unveils Gemini 3.5 Live Translate

Google has launched Gemini 3.5 Live Translate, a dedicated audio model designed for real-time simultaneous interpretation across more than 70 languages.

Google announced Gemini 3.5 Live Translate on June 9, a dedicated audio model designed for real-time simultaneous interpretation. Unlike conventional systems that require a speaker to pause, the model translates speech on the fly, trailing just a few seconds behind the active speaker.

The model supports over 70 languages and more than 2,000 language pairs. It automatically detects the active speaker's language and translates it into the listener's language without requiring manual settings.

The rollout began immediately, with the technology integrated into the Google Translate app. Google Meet begins a private preview for enterprise customers this month, and a developer API is already open in public preview. This marks the first audio-focused expansion of the Gemini 3.5 model family, which debuted at last month's I/O 2026 keynote.

From Turn-Based Translation to Simultaneous Interpretation

The model shifts conversational translation toward true simultaneous interpretation. Traditional systems rely on a turn-based approach, starting translation only after a speaker finishes. In contrast, Gemini 3.5 Live Translate processes a continuous audio stream to deliver real-time speech translation while the speaker is talking.

As Ars Technica noted, this approach closely mirrors human simultaneous interpretation. By translating continuously rather than waiting for sentence completion, the system maintains natural conversational flow.

Voice synthesis has also advanced. Rather than producing robotic output, the model preserves the original speaker's intonation, pacing, and pitch. Google notes that the system successfully handles noisy environments, overlapping speech, and informal phrasing.

Google Translate App, Meet, and API: A Three-Track Rollout

Simultaneous interpretation AI demo in a Google Meet video call
A still from Google's speech translation demo in Google Meet

The technology is being deployed across three channels. First, the Google Translate app for Android and iOS has added a 'Live Translate' mode. With any pair of headphones, the app delivers translations in real time; the Android version of the Google Translate app also adds a hands-free listening mode activated by holding the device to the ear.

Second, the feature is entering private preview for select Google Workspace enterprise customers on Google Meet. This expands simultaneous interpretation in Meet from five languages translated only to and from English to more than 70 languages, with a full release planned for later this year.

Finally, developers can integrate real-time speech translation into their own apps. Available in public preview via the Gemini Live API and AI Studio under the model ID 'gemini-3.5-live-translate-preview', the system is currently being tested by early partners, including Grab and CJ ENM.

Technical Limitations and Synthetic Voice Safeguards

Despite these capabilities, early evaluations of the simultaneous interpretation model highlight ongoing challenges. Google DeepMind's model card acknowledges limitations, noting that strong accents can cause transcription errors and that the generated voice can occasionally shift mid-sentence.

To address potential misuse, Google is embedding SynthID watermarks in all generated audio to identify it as synthetic. Given the model's ability to replicate a speaker's actual voice, this safeguard is crucial for security.

Early feedback focuses on the model's ability to preserve the speaker's unique vocal characteristics. Many early users describe the experience of hearing a foreign language spoken in a friend's familiar voice as surreal. Whether simultaneous interpretation becomes routine in professional environments will ultimately depend on how quickly Google resolves performance issues with diverse accents.

Menu