Google Releases Gemini 3.5 Reside Translate, a Streaming Speech-to-Speech Audio Mannequin Masking 70+ Languages Throughout Meet, Translate, and the Reside API

June 10, 2026

50

Google simply introduced Gemini 3.5 Reside Translate. It’s their newest audio mannequin for stay speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The mannequin detects over 70 languages mechanically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch within the output. Flip-by-turn methods look forward to a speaker to complete earlier than responding. Gemini 3.5 Reside Translate generates speech constantly as an alternative. It balances a trade-off between ready for context and translating instantly. Extra context improves high quality. Quicker output retains the interpretation in sync with the speaker. The end result stays just a few seconds behind the speaker all through a session.

Gemini 3.5 Reside Translate

Gemini 3.5 Reside Translate is a single audio mannequin (gemini-3.5-live-translate-preview), not a chat assistant. It processes speech because the audio streams in, fairly than after a full sentence. It handles multilingual inputs with out manually configuring settings. Its noise robustness lets functions run in loud, unpredictable environments.

The mannequin is rolling out throughout three surfaces. Builders get it in public preview by the Gemini Reside API and Google AI Studio. Enterprises get a non-public preview in Google Meet beginning this month. Everybody else will get it by the Google Translate app on Android and iOS.

How the Steady Streaming Works

The design distinction issues for constructing real-time options. A conversational Reside agent makes use of turn-based interactions. It depends on pauses, intent detection, and interruption dealing with. Reside Translation makes use of steady stream processing as an alternative. It interprets because the speaker talks, with out ready for turns to finish.

To carry strict real-time latency thresholds, the interpretation path accepts audio enter solely. Textual content enter just isn’t supported in translation mode. The mannequin additionally drops device use and system directions on this mode. That retains it a targeted translator pipeline fairly than a basic agent.

Constructing With the Reside API

Builders configure translation contained in the Reside API session setup. You set a translationConfig block inside the generationConfig. The targetLanguageCode discipline takes a BCP-47 code, reminiscent of "pl" or "es". BCP-47 is the usual format for language tags like en or pt-BR. It defaults to "en". The echoTargetLanguage boolean controls enter that’s already within the goal language. When true, the mannequin echoes that speech. When false, it stays silent. You too can allow inputAudioTranscription and outputAudioTranscription for textual content transcripts.

Audio codecs are mounted. Enter is uncooked 16-bit PCM at 16kHz, mono, little-endian. Output is uncooked 16-bit PCM at 24kHz, mono, little-endian. PCM is uncompressed uncooked audio. You ship audio in chunks of 100ms. For client-side apps, ephemeral tokens on the v1alpha endpoint keep away from exposing your API key.

Dimension	Reside Agent	Reside Translation
Mannequin function	Assistant that listens, causes, and acts	Interpreter / real-time translator pipeline
Interplay	Flip-based, with interruption dealing with	Steady stream processing, no turns
Instruments	Perform calling, Google Search, directions	Translation solely, no instruments or directions
Inputs	Textual content, audio, video, and picture	Audio solely, for strict latency
Configuration	Technology, speech, instruments, directions	`targetLanguageCode` and `echoTargetLanguage`

Use Case

The mannequin targets stay interpretation throughout a number of settings. Google lists multilingual calls, conferences, classes, and broadcasts. Developer platforms cut back the combination work for real-time media. Agora, Fishjam, LiveKit, Pipecat, and Imaginative and prescient Brokers already use the Reside API. These platforms deal with the complicated real-time media streaming infrastructure. That lets builders concentrate on the consumer expertise as an alternative.

Google’s instance app demonstrates dubbing and simultaneous multi-language translation. Seize is testing the mannequin for driver-and-traveler communication at pickups. Seize customers make over 10 million voice calls per thirty days. CJ ENM, LiveKit, and others reported constructive suggestions on high quality, accuracy, and low latency.

How It Adjustments Google Meet and Translate

In keeping with Google’s official launch, Google Meet will quickly use 3.5 Reside Translate for speech translation. The desk reveals the acknowledged before-and-after for Meet.

Functionality	Earlier Meet	With 3.5 Reside Translate
Languages	5	70+
Mixtures per assembly	Solely to and from English	2000+ combos
Entry	Present interface	Up to date interface for immediate entry

The Meet replace is in personal preview for choose enterprise Workspace prospects this month. A broader rollout follows later this 12 months. Within the Translate app, the Reside translate characteristic works with any linked headphones. It mirrors the speaker’s tone throughout 70+ languages. Android additionally positive factors a listening mode. You maintain the telephone to your ear like an everyday name. The translated audio then streams by the earpiece, with out others listening to.

Key Takeaways

Gemini 3.5 Reside Translate is Google’s newest audio mannequin for stay speech-to-speech translation throughout 70+ languages.
It streams constantly as an alternative of turn-by-turn, staying just a few seconds behind the speaker.
Builders can configure it by way of the Reside API utilizing targetLanguageCode and echoTargetLanguage; audio-only, 16kHz in, 24kHz out.
It rolls out to the Gemini Reside API, Google Meet (5→70+ languages), and the Translate app.
All generated audio carries an imperceptible SynthID watermark for detectability.

Take a look at the Mannequin Card and Technical particulars. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you possibly can be a part of us on telegram as nicely.

Have to companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Join with us

Google Releases Gemini 3.5 Reside Translate, a Streaming Speech-to-Speech Audio Mannequin Masking 70+ Languages Throughout Meet, Translate, and the Reside API

Gemini 3.5 Reside Translate

How the Steady Streaming Works

Constructing With the Reside API

Use Case

How It Adjustments Google Meet and Translate

Key Takeaways

Related Articles

4 Strains You Ought to Embody in Your Claude Talent

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Pondering-Effort Ranges, and No Benchmarks at Launch

Otokichi drifted 14 months throughout the Pacific at age 14

Latest Articles

4 Strains You Ought to Embody in Your Claude Talent

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Pondering-Effort Ranges, and No Benchmarks at Launch

Otokichi drifted 14 months throughout the Pacific at age 14

Catch Mercury shining at its greatest on June 15 earlier than it slips again into the solar’s glare

How xAI, Tesla, X, Neuralink, and SpaceX Are Converging