Gemini Live API plugin | LiveKit docs

Available in

Python

Node.js

Chat with a voice assistant built with LiveKit and the Gemini Live API

Overview

Google's Gemini Live API enables low-latency, two-way interactions that use text, audio, and video input, with audio and text output. LiveKit's Google plugin includes a RealtimeModel class that allows you to use this API to create agents with natural, human-like voice conversations.

Quick reference

This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.

Installation

Install the Google plugin:

uv add "livekit-agents[google]~=1.3"

pnpm add "@livekit/agents-plugin-google@1.x"

Authentication

The Google plugin requires authentication based on your chosen service:

For Vertex AI, you must set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of the service account key file. For more information about mounting files as secrets when deploying to LiveKit Cloud, see File-mounted secrets .
For the Google Gemini API, set the GOOGLE_API_KEY environment variable.

Usage

Use the Gemini Live API within an AgentSession. For example, you can use it in the Voice AI quickstart.

from livekit.plugins import google

session = AgentSession(
    llm=google.realtime.RealtimeModel(
        voice="Puck",
        temperature=0.8,
        instructions="You are a helpful assistant",
    ),
)

import * as google from '@livekit/agents-plugin-google';

const session = new voice.AgentSession({
   llm: new google.realtime.RealtimeModel({
      model: "gemini-2.5-flash-native-audio-preview-12-2025",
      voice: "Puck",
      temperature: 0.8,
      instructions: "You are a helpful assistant",
   }),
});

Parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference links in the Additional resources section.

instructionsstringOptional

System instructions to better control the model's output and specify tone and sentiment of responses. To learn more, see System instructions.

modelLiveAPIModels | stringRequiredDefault: gemini-2.5-flash

Live API model to use.

api_keystringRequiredEnv: GOOGLE_API_KEY

Google Gemini API key.

voiceVoice | stringRequiredDefault: Puck

Name of the Gemini Live API voice. For a full list, see Voices.

modalitieslist[Modality]OptionalDefault: ["AUDIO"]

List of response modalities to use. Set to ["TEXT"] to use the model in text-only mode with a separate TTS plugin.

vertexaibooleanRequiredDefault: false

If set to true, use Vertex AI.

projectstringOptionalEnv: GOOGLE_CLOUD_PROJECT

Google Cloud project ID to use for the API (if vertextai=True). By default, it uses the project in the service account key file (set using the GOOGLE_APPLICATION_CREDENTIALS environment variable).

locationstringOptionalEnv: GOOGLE_CLOUD_LOCATION

Google Cloud location to use for the API (if vertextai=True). By default, it uses the location from the service account key file or us-central1.

thinking_configThinkingConfigOptional

Configuration for the model's thinking mode, if supported. For more information, see Thinking.

enable_affective_dialogbooleanOptionalDefault: false

Enable affective dialog on supported native audio models. For more information, see Affective dialog.

proactivitybooleanOptionalDefault: false

Enable proactive audio, where the model can decide not to respond to certain inputs. Requires a native audio model. For more information, see Proactive audio.

Provider tools

See Gemini LLM provider tools for more information about tools that enable the model to use built-in capabilities executed on the model server.

Turn detection

The Gemini Live API includes built-in VAD-based turn detection, enabled by default. To use LiveKit's turn detection model instead, configure the model to disable automatic activity detection. A separate streaming STT model is required in order to use LiveKit's turn detection model.

from google.genai import types
from livekit.agents import AgentSession
from livekit.plugins.turn_detector.multilingual import MultilingualModel

session = AgentSession(
   turn_detection=MultilingualModel(),
   llm=google.realtime.RealtimeModel(
      realtime_input_config=types.RealtimeInputConfig(
      automatic_activity_detection=types.AutomaticActivityDetection(
         disabled=True,
      ),
   ),
   input_audio_transcription=None,
   stt="assemblyai/universal-streaming",
)

import * as google from '@livekit/agents-plugin-google';
import * as livekit from '@livekit/agents-plugin-livekit';

const session = new voice.AgentSession({
   turnDetection: new MultilingualModel(),
   llm: new google.realtime.RealtimeModel({
      model: "gemini-2.5-flash-native-audio-preview-12-2025",
      realtimeInputConfig: {
         automaticActivityDetection: {
            disabled: true,
         },
      },
   }),
   stt: "assemblyai/universal-streaming",
   turnDetection: new livekit.turnDetector.MultilingualModel(),
});

Thinking

The latest model, gemini-2.5-flash-native-audio-preview-09-2025, supports thinking. You can configure its behavior with the thinking_config parameter.

By default, the model's thoughts are forwarded like other transcripts. To disable this, set include_thoughts=False:

from google.genai import types

# ...

session = AgentSession(
    llm=google.realtime.RealtimeModel(
        thinking_config=types.ThinkingConfig(
            include_thoughts=False,
        ),
    ),
)

For other available parameters, such as thinking_budget, see the Gemini thinking docs.

Usage with separate TTS

You can combine Gemini Live API and a separate TTS instance to build a half-cascade architecture. This configuration allows you to gain the benefits of realtime speech comprehension while maintaining complete control over the speech output.

from google.genai.types import Modality

session = AgentSession(
    llm=google.realtime.RealtimeModel(modalities=[Modality.TEXT]),
    tts="cartesia/sonic-3",
)

import * as google from '@livekit/agents-plugin-google';

const session = new voice.AgentSession({
   llm: new google.realtime.RealtimeModel({
      model: "gemini-2.5-flash-native-audio-preview-12-2025",
      modalities: [google.types.Modality.TEXT],
   }),
   tts: "cartesia/sonic-3",
});

Additional resources

The following resources provide more information about using Gemini with LiveKit Agents.

Python plugin

Reference GitHub PyPI

Node.js plugin

Reference GitHub NPM

Gemini docs

Gemini Live API documentation.

Voice AI quickstart

Get started with LiveKit Agents and Gemini Live API.

Google AI ecosystem guide

Overview of the entire Google AI and LiveKit Agents integration.