Chat with a voice assistant built with LiveKit and Google's Multimodal Live API

Overview
LiveKit's Google integration provides support for Google Gemini LLM, Google Cloud STT and TTS, and Multimodal Live API:
- Google plugin support for Gemini LLM, and Google Cloud STT and TTS.
- Support for Google's Multimodal Live API using the
RealtimeModel
class.
The following sections provide a quick reference for integrating Google AI services with LiveKit. For the complete reference, see the links provided in each section.
Gemini LLM
LiveKit's Google plugin provides support for Gemini models across both Google AI and Vertex AI platforms. Use LiveKit's Google integration with the LiveKit Agents framework and create AI agents with advanced reasoning and contextual understanding.
google.LLM usage
Create a new instance of Gemini LLM to use in a VoicePipelineAgent
:
from livekit.plugins import googlegoogle_llm = google.LLM(model="gemini-2.0-flash-exp",temperature="0.8",)
google.LLM parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
Google application credentials must be provided using one of the following options:
For Vertex AI, the
GOOGLE_APPLICATION_CREDENTIALS
environment variable must be set to the path of the service account key file.The Google Cloud project and location can be set via
project
andlocation
arguments or the environment variablesGOOGLE_CLOUD_PROJECT
andGOOGLE_CLOUD_LOCATION
. By default, the project is inferred from the service account key file and the location defaults to "us-central1".For Google AI, set the
api_key
argument or theGOOGLE_API_KEY
environment variable.
ID of the model to use. For a full list, see Gemini models.
API key for Google Gemini.
Google Cloud project to use (only if using Vertex AI).
The temperature controls the degree of randomness in token selection. A lower temperature results in more deterministic output. To learn more, see Model parameters.
Maximum number of tokens that can be generated in the response. To learn more, see Model parameters.
Google Cloud STT and TTS
LiveKit's Google integration includes a Google plugin with STT and TTS support. Google Cloud STT supports over 125 languages and can use chirp
, a foundational model with improved recognition and transcription for spoken languages and accents. Google Cloud TTS provides a wide voice selection and generates speech with humanlike intonation. Instances of Google STT and TTS can be used as part of the pipeline for an agent created using the VoicePipelineAgent
class or as part of a standalone transcription service.
LiveKit's Google plugin is currently only available in Python.
google.STT usage
Use the google.STT
method to create an instance of an STT:
from livekit.plugins import googlegoogle_stt = google.STT(model="chirp",spoken_punctuation=True,)
google.STT parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
Google Cloud credentials must be provided by one of the following methods:
- Passed in the
credentials_info
dictionary. - Saved in the
credentials_file
JSON file. - Application Default Credentials. To learn more, see How Application Default Credentials works
Specify input languages. For a full list of supported languages, see Speech-to-text supported languages.
Replace spoken punctuation with punctuation characters in text.
Model to use for speech to text. To learn more, see Select a transcription model.
Key-value pairs of authentication credential information.
Name of the JSON file that contains authentication credentials for Google Cloud.
google.TTS usage
Use the google.TTS
method to create an instance of a TTS:
from livekit.plugins import googlegoogle_stt = google.TTS(gender="female",voice_name="en-US-Standard-H",)
google.TTS parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
Google Cloud credentials must be provided by one of the following methods:
- Passed in the
credentials_info
dictionary. - Saved in the
credentials_file
file. - Application Default Credentials. To learn more, see How Application Default Credentials works
Specify output language. For a full list of languages, see Supported voices and languages.
Voice gender. Valid values are male
, female
, and neutral
.
Name of the voice to use for speech. For a full list of voices, see Supported voices and languages.
Key-value pairs of authentication credential information.
Name of the JSON file that contains authentication credentials for Google Cloud.
Multimodal Live API
LiveKit's Google plugin includes a RealtimeModel
class that allows you to use Google's Multimodal Live API. The Multimodal Live API enables low-latency, two-way interactions that use text, audio, and video input, with audio and text output. Use LiveKit's Google integration with the Agents framework to create agents with natural, human-like voice conversations.
RealtimeModel usage
Create a model using the Multimodal Live API for use in a MultimodalAgent
:
from livekit.plugins import googlemodel=google.beta.realtime.RealtimeModel(voice="Puck",temperature=0.8,instructions="You are a helpful assistant",),
For a full agent example, see the Gemini example in the LiveKit Agents GitHub repository.
RealtimeModel parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
System instructions to better control the model's output and specify tone and sentiment of responses. To learn more, see System instructions.
Live API model to use.
Google Gemini API key.
Name of the Multimodal Live voice. For a full list, see Voices.
List of modalities to use, such as ["TEXT", "AUDIO"].
If set to true, use Vertex AI.
Google Cloud project ID to use for the API (if vertextai=True
). By default, the project is inferred from the service account key file (set using the GOOGLE_APPLICATION_CREDENTIALS
environment variable).
Google Cloud location to use for the API (if vertextai=True
). By default, the project is inferred from the service account key file and the location defaults to us-central1
.
A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see Temperature.