Chat with a voice assistant built with LiveKit and the Gemini Live API
Overview
Google's Gemini Live API enables low-latency, two-way interactions that use text, audio, and video input, with audio and text output. LiveKit's Google plugin includes a RealtimeModel class that allows you to use this API to create agents with natural, human-like voice conversations.
Installation
Install the Google plugin:
uv add "livekit-agents[google]~=1.4"
pnpm add "@livekit/agents-plugin-google@1.x"
Authentication
The Google plugin requires authentication based on your chosen service:
- For Vertex AI, you must set the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable to the path of the service account key file. For more information about mounting files as secrets when deploying to LiveKit Cloud, see File-mounted secrets. - For the Google Gemini API, set the
GOOGLE_API_KEYenvironment variable.
Usage
Use the Gemini Live API within an AgentSession. For example, you can use it in the Voice AI quickstart.
from livekit.plugins import googlesession = AgentSession(llm=google.realtime.RealtimeModel(voice="Puck",temperature=0.8,instructions="You are a helpful assistant",),)
import * as google from '@livekit/agents-plugin-google';const session = new voice.AgentSession({llm: new google.beta.realtime.RealtimeModel({model: "gemini-2.5-flash-native-audio-preview-12-2025",voice: "Puck",temperature: 0.8,instructions: "You are a helpful assistant",}),});
Parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference links in the Additional resources section.
instructionsstringSystem instructions to better control the model's output and specify tone and sentiment of responses. To learn more, see System instructions.
modelLiveAPIModels | stringDefault: gemini-2.5-flashLive API model to use.
api_keystringEnv: GOOGLE_API_KEYGoogle Gemini API key.
voiceVoice | stringDefault: PuckName of the Gemini Live API voice. For a full list, see Voices.
modalitieslist[Modality]Default: ["AUDIO"]List of response modalities to use. Set to ["TEXT"] to use the model in text-only mode with a separate TTS plugin.
vertexaibooleanDefault: falseIf set to true, use Vertex AI.
projectstringEnv: GOOGLE_CLOUD_PROJECTGoogle Cloud project ID to use for the API (if vertexai=True). By default, it uses the project in the service account key file (set using the GOOGLE_APPLICATION_CREDENTIALS environment variable).
locationstringEnv: GOOGLE_CLOUD_LOCATIONGoogle Cloud location to use for the API (if vertexai=True). By default, it uses the location from the service account key file or us-central1.
thinking_configThinkingConfigConfiguration for the model's thinking mode, if supported. For more information, see Thinking.
enable_affective_dialogbooleanDefault: falseEnable affective dialog on supported native audio models. Not supported on Gemini 3.1 models. For more information, see Affective dialog.
proactivitybooleanDefault: falseEnable proactive audio, where the model can decide not to respond to certain inputs. Requires a native audio model. Not supported on Gemini 3.1 models. For more information, see Proactive audio.
Gemini 3.1 compatibility
gemini-3.1-flash-live-preview has known compatibility limitations with LiveKit Agents. A long-term fix is being investigated, but for now some features don't work with this model. This section documents the current state.
Gemini 3.1 Flash Live Preview restricts send_client_content to initial history seeding only. After the first model turn, the model rejects send_client_content with a 1007 error. generate_reply(), update_instructions(), and update_chat_ctx() are not compatible with 3.1 models. The plugin logs a warning and the call is ignored. The session still stores the updated values internally, but the changes aren't sent to the model mid-session.
Basic voice conversations, tool calling, and audio I/O work normally with 3.1.
Other breaking changes
These changes also apply when migrating from Gemini 2.5 to 3.1:
- Affective dialog and proactive audio are not supported. Remove these options from your configuration.
- Asynchronous function calling is not supported. The model pauses and waits for your tool response before continuing.
- The
thinkingConfigparameter usesthinkingLevel(options:"minimal","low","medium","high") instead ofthinkingBudget. The default is"minimal"for lowest latency.
Migrating from 2.5 to 3.1
To use the 3.1 model, update the model parameter:
session = AgentSession(llm=google.realtime.RealtimeModel(model="gemini-3.1-flash-live-preview",voice="Puck",instructions="You are a helpful assistant",),)
const session = new voice.AgentSession({llm: new google.beta.realtime.RealtimeModel({model: "gemini-3.1-flash-live-preview",voice: "Puck",instructions: "You are a helpful assistant",}),});
Additional changes when migrating:
- Remove
enable_affective_dialogandproactivityparameters if set. - Replace
thinkingBudgetwiththinkingLevelin yourthinking_config. generate_reply(),update_instructions(), andupdate_chat_ctx()are not compatible with 3.1. Calls are ignored with a warning.
For the full list of changes, see Google's migration guide.
Provider tools
See Gemini LLM provider tools for more information about tools that enable the model to use built-in capabilities executed on the model server.
Turn detection
The Gemini Live API includes built-in VAD-based turn detection, enabled by default. To use LiveKit's turn detection model instead, configure the model to disable automatic activity detection. A separate streaming STT model is required in order to use LiveKit's turn detection model.
from google.genai import typesfrom livekit.agents import AgentSession, TurnHandlingOptionsfrom livekit.plugins.turn_detector.multilingual import MultilingualModelsession = AgentSession(turn_handling=TurnHandlingOptions(turn_detection=MultilingualModel(),),llm=google.realtime.RealtimeModel(realtime_input_config=types.RealtimeInputConfig(automatic_activity_detection=types.AutomaticActivityDetection(disabled=True,),),input_audio_transcription=None,),stt="deepgram/nova-3",)
import * as google from '@livekit/agents-plugin-google';import * as livekit from '@livekit/agents-plugin-livekit';const session = new voice.AgentSession({llm: new google.beta.realtime.RealtimeModel({model: "gemini-2.5-flash-native-audio-preview-12-2025",realtimeInputConfig: {automaticActivityDetection: {disabled: true,},},}),stt: "deepgram/nova-3",turnHandling: {turnDetection: new livekit.turnDetector.MultilingualModel(),},});
Thinking
Native audio Gemini models support thinking. You can configure its behavior with the thinking_config parameter. Note that Gemini 3.1 uses thinkingLevel instead of thinkingBudget — see Gemini 3.1 compatibility for details.
By default, the model's thoughts are forwarded like other transcripts. To disable this, set include_thoughts=False:
from google.genai import types# ...session = AgentSession(llm=google.realtime.RealtimeModel(thinking_config=types.ThinkingConfig(include_thoughts=False,),),)
import * as google from '@livekit/agents-plugin-google';// ...const session = new voice.AgentSession({llm: new google.beta.realtime.RealtimeModel({thinkingConfig: {includeThoughts: false,},}),});
For other available parameters, such as thinking_budget, see the Gemini thinking docs.
Usage with separate TTS
Please refer to this open Google issue and note that this architecture only works with non-native-audio models.
You can combine Gemini Live API and a separate TTS instance to build a half-cascade architecture. This configuration allows you to gain the benefits of realtime speech comprehension while maintaining complete control over the speech output.
from google.genai.types import Modalitysession = AgentSession(llm=google.realtime.RealtimeModel(modalities=[Modality.TEXT]),tts="cartesia/sonic-3",)
import * as google from '@livekit/agents-plugin-google';const session = new voice.AgentSession({llm: new google.beta.realtime.RealtimeModel({model: "gemini-2.5-flash-native-audio-preview-12-2025",modalities: [google.types.Modality.TEXT],}),tts: "cartesia/sonic-3",});
Additional resources
The following resources provide more information about using Gemini with LiveKit Agents.