Google integration guide

An introduction to using the LiveKit Agents framework with Google AI and Vertex AI.

Try the playground

Chat with a voice assistant built with LiveKit and Google's Multimodal Live API

Try the playground

Overview

LiveKit's Google integration provides support for Google Gemini LLM, Google Cloud STT and TTS, and Multimodal Live API:

The following sections provide a quick reference for integrating Google AI services with LiveKit. For the complete reference, see the links provided in each section.

Gemini LLM

LiveKit's Google plugin provides support for Gemini models across both Google AI and Vertex AI platforms. Use LiveKit's Google integration with the LiveKit Agents framework and create AI agents with advanced reasoning and contextual understanding.

google.LLM usage

Create a new instance of Gemini LLM to use in a VoicePipelineAgent:

from livekit.plugins import google
google_llm = google.LLM(
model="gemini-2.0-flash-exp",
temperature="0.8",
)

google.LLM parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

Note

Google application credentials must be provided using one of the following options:

  • For Vertex AI, the GOOGLE_APPLICATION_CREDENTIALS environment variable must be set to the path of the service account key file.

    The Google Cloud project and location can be set via project and location arguments or the environment variables GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION. By default, the project is inferred from the service account key file and the location defaults to "us-central1".

  • For Google AI, set the api_key argument or the GOOGLE_API_KEY environment variable.

modelChatModels | strOptionalDefault: gemini-2.0-flash-exp

ID of the model to use. For a full list, see Gemini models.

api_keystrOptionalEnv: GOOGLE_API_KEY

API key for Google Gemini.

vertexaiboolOptionalDefault: false

True to use Vertex AI; false to use Google AI.

projectstrOptional

Google Cloud project to use (only if using Vertex AI).

temperaturefloatOptionalDefault: 0.8

The temperature controls the degree of randomness in token selection. A lower temperature results in more deterministic output. To learn more, see Model parameters.

max_output_tokensintOptional

Maximum number of tokens that can be generated in the response. To learn more, see Model parameters.

Google Cloud STT and TTS

LiveKit's Google integration includes a Google plugin with STT and TTS support. Google Cloud STT supports over 125 languages and can use chirp, a foundational model with improved recognition and transcription for spoken languages and accents. Google Cloud TTS provides a wide voice selection and generates speech with humanlike intonation. Instances of Google STT and TTS can be used as part of the pipeline for an agent created using the VoicePipelineAgent class or as part of a standalone transcription service.

Note

LiveKit's Google plugin is currently only available in Python.

google.STT usage

Use the google.STT method to create an instance of an STT:

from livekit.plugins import google
google_stt = google.STT(
model="chirp",
spoken_punctuation=True,
)

google.STT parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

Note

Google Cloud credentials must be provided by one of the following methods:

languagesLanguageCodeOptionalDefault: en-US

Specify input languages. For a full list of supported languages, see Speech-to-text supported languages.

spoken_punctuationbooleanOptionalDefault: True

Replace spoken punctuation with punctuation characters in text.

modelSpeechModels | stringOptionalDefault: long

Model to use for speech to text. To learn more, see Select a transcription model.

credentials_infoarrayOptional

Key-value pairs of authentication credential information.

credentials_filestringOptional

Name of the JSON file that contains authentication credentials for Google Cloud.

google.TTS usage

Use the google.TTS method to create an instance of a TTS:

from livekit.plugins import google
google_stt = google.TTS(
gender="female",
voice_name="en-US-Standard-H",
)

google.TTS parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

Note

Google Cloud credentials must be provided by one of the following methods:

languageSpeechLanguages | stringOptionalDefault: en-US

Specify output language. For a full list of languages, see Supported voices and languages.

genderGender | stringOptionalDefault: neutral

Voice gender. Valid values are male, female, and neutral.

voice_namestringOptional

Name of the voice to use for speech. For a full list of voices, see Supported voices and languages.

credentials_infoarrayOptional

Key-value pairs of authentication credential information.

credentials_filestringOptional

Name of the JSON file that contains authentication credentials for Google Cloud.

Multimodal Live API

LiveKit's Google plugin includes a RealtimeModel class that allows you to use Google's Multimodal Live API. The Multimodal Live API enables low-latency, two-way interactions that use text, audio, and video input, with audio and text output. Use LiveKit's Google integration with the Agents framework to create agents with natural, human-like voice conversations.

RealtimeModel usage

Create a model using the Multimodal Live API for use in a MultimodalAgent:

from livekit.plugins import google
model=google.beta.realtime.RealtimeModel(
voice="Puck",
temperature=0.8,
instructions="You are a helpful assistant",
),

For a full agent example, see the Gemini example in the LiveKit Agents GitHub repository.

RealtimeModel parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

instructionsstringOptional

System instructions to better control the model's output and specify tone and sentiment of responses. To learn more, see System instructions.

modelLiveAPIModels | stringRequiredDefault: gemini-2.0-flash-exp

Live API model to use.

api_keystringRequiredEnv: GOOGLE_API_KEY

Google Gemini API key.

voiceVoice | stringRequiredDefault: Puck

Name of the Multimodal Live voice. For a full list, see Voices.

modalitieslist[Modality]OptionalDefault: ["AUDIO"]

List of modalities to use, such as ["TEXT", "AUDIO"].

vertexaibooleanRequiredDefault: False

If set to true, use Vertex AI.

projectstringOptionalEnv: GOOGLE_CLOUD_PROJECT

Google Cloud project ID to use for the API (if vertextai=True). By default, the project is inferred from the service account key file (set using the GOOGLE_APPLICATION_CREDENTIALS environment variable).

locationstringOptionalEnv: GOOGLE_CLOUD_LOCATION

Google Cloud location to use for the API (if vertextai=True). By default, the project is inferred from the service account key file and the location defaults to us-central1.

temperaturefloatOptional

A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see Temperature.