Google AI and LiveKit | LiveKit Documentation

Play with the Gemini Live API in this LiveKit-powered playground

Overview

This guide walks you through building a voice AI assistant with Google Gemini and LiveKit Agents. In less than 10 minutes, you have a voice assistant that you can speak to in your terminal, browser, or on the phone.

LiveKit Agents overview

LiveKit Agents is an open source framework for building realtime AI apps in Python and Node.js. It supports complex voice AI workflows with multiple agents and discrete processing steps, and includes built-in load balancing.

LiveKit provides SIP support for telephony integration and full-featured frontend SDKs in multiple languages. It uses WebRTC transport for end-user devices, enabling high-quality, low-latency realtime experiences. To learn more, see LiveKit Agents.

Google AI ecosystem support

Google AI provides some of the most powerful AI models and services today, which integrate into LiveKit Agents in the following ways:

Gemini Live API: A speech-to-speech realtime model with live video input.
Gemini: A family of general purpose high-performance LLMs.
Gemini TTS: A speech synthesis model that generates customizable speech from text.
Google Cloud STT and TTS: Affordable, production-grade models for transcription and speech synthesis.

LiveKit Agents supports Google AI through the Gemini API and Vertex AI.

Requirements

The following sections describe the minimum requirements to get started:

LiveKit Agents requires Python >= 3.10.
This guide uses the uv package manager.

LiveKit Cloud

This guide assumes you have signed up for a free LiveKit Cloud account. LiveKit Cloud includes agent deployment, model inference, and realtime media transport. Create a free project and use the API keys in the following steps to get started.

While this guide assumes LiveKit Cloud, the instructions can be adapted for self-hosting the open-source LiveKit server instead. For self-hosting in production, set up a custom deployment environment.

LiveKit Docs MCP server

If you're using an AI coding assistant, you should install the LiveKit Docs MCP server to get the most out of it. This ensures your agent has access to the latest documentation and examples.

LiveKit CLI

Use the LiveKit CLI to manage LiveKit API keys and deploy your agent to LiveKit Cloud.

Install the LiveKit CLI:
Install the LiveKit CLI with Homebrew:
```
brew install livekit-cli
```
```
curl -sSL https://get.livekit.io/cli | bash
```
Tip
You can also download the latest precompiled binaries here.
```
winget install LiveKit.LiveKitCLI
```
Tip
You can also download the latest precompiled binaries here.
This repo uses Git LFS for embedded video resources. Please ensure git-lfs is installed on your machine before proceeding.
```
git clone github.com/livekit/livekit-cli
make install
```
Link your LiveKit Cloud project to the CLI:
```
lk cloud auth
```
This opens a browser window to authenticate and link your project to the CLI.

AI models

Voice agents require one or more AI models to provide understanding, intelligence, and speech. LiveKit Agents supports both high-performance STT-LLM-TTS voice pipelines constructed from multiple specialized models, as well as realtime models with direct speech-to-speech capabilities.

The rest of this guide presents two options for getting started with Gemini:

Use the Gemini Live API for an expressive and lifelike voice experience with a single realtime model. This is the simplest way to get started with Gemini.

Model	Required Key
Gemini Live API	`GOOGLE_API_KEY`

String together three specialized Google services into a high-performance voice pipeline.

Component	Model
STT	Google Cloud STT (Chirp)
LLM	Gemini 2.0 Flash
TTS	Gemini TTS

Setup

Use the instructions in the following sections to set up your new project.

Project initialization

Create a new project for the voice agent.

Run the following commands to use uv to create a new project ready to use for your new voice agent:

uv init livekit-gemini-agent --bare
cd livekit-gemini-agent

Install packages

Install the following packages to build a voice AI agent with the Gemini Live API, noise cancellation, and turn detection:

uv add \
  "livekit-agents[silero,google]~=1.4" \
  "livekit-plugins-noise-cancellation~=0.2" \
  "python-dotenv"

Install the following packages to build a complete voice AI agent with Gemini, noise cancellation, and turn detection:

uv add \
  "livekit-agents[silero,turn-detector,google]~=1.4" \
  "livekit-plugins-noise-cancellation~=0.2" \
  "python-dotenv"

Environment variables

Run the following command to load your LiveKit Cloud API keys into a .env.local file:

lk app env -w

Add your Google API key from the Google AI Studio:

LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your LiveKit server URL>
GOOGLE_API_KEY=<Your Google API Key>

Add your Google API key from the Google AI Studio. For Google Cloud STT, you also need to set up Google Cloud credentials:

LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your LiveKit server URL>
GOOGLE_API_KEY=<Your Google API Key>
GOOGLE_APPLICATION_CREDENTIALS=<Path to your Google Cloud service account JSON file>

Google Cloud credentials

Google Cloud STT requires a Google Cloud project with the Speech-to-Text API enabled. Create a service account key and download the JSON file. To learn more, see Google Cloud authentication.

Agent code

Create a file with your agent code.

from dotenv import load_dotenv

from livekit import agents, rtc
from livekit.agents import AgentServer, AgentSession, Agent, room_io
from livekit.plugins import (
    google,
    noise_cancellation,
    silero,
)

load_dotenv(".env.local")


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant powered by Gemini.")


server = AgentServer()


@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        llm=google.realtime.RealtimeModel(
            voice="Puck",
        ),
        vad=silero.VAD.load(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony() if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )


if __name__ == "__main__":
    agents.cli.run_app(server)

from dotenv import load_dotenv

from livekit import agents, rtc
from livekit.agents import AgentServer, AgentSession, Agent, room_io
from livekit.plugins import google, noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv(".env.local")


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant powered by Google.
            You eagerly assist users with their questions by providing information from your extensive knowledge.
            Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
            You are curious, friendly, and have a sense of humor.""",
        )


server = AgentServer()


@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=google.STT(
            model="chirp",
        ),
        llm=google.LLM(
            model="gemini-2.5-flash",
        ),
        tts=google.TTS(
            gender="female",
            voice_name="en-US-Standard-H",
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony() if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP else noise_cancellation.BVC(),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance.",
    )


if __name__ == "__main__":
    agents.cli.run_app(server)

Download model files

If you're using the turn-detector plugin, you first need to download the model files:

uv run agent.py download-files

Speak to your agent

Start your agent in console mode to run inside your terminal:

uv run agent.py console

Your agent speaks to you in the terminal, and you can speak to it as well.

Connect to playground

Start your agent in dev mode to connect it to LiveKit and make it available from anywhere on the internet:

uv run agent.py dev

Use the Agents playground to speak with your agent and explore its full range of multimodal capabilities.

Deploy to LiveKit Cloud

From the root of your project, run the following command with the LiveKit CLI. Ensure you have linked your LiveKit Cloud project.

lk agent create

The CLI creates Dockerfile, .dockerignore, and livekit.toml files in your current directory, then registers your agent with your LiveKit Cloud project and deploys it.

After the deployment completes, you can access your agent in the playground, or continue to use the console mode as you build and test your agent locally.

Additional resources

The following links provide more information on each available Google component in LiveKit Agents.

Gemini Vision Assistant

Build a vision-aware voice assistant with Gemini Live.

Gemini LLM

LiveKit Agents plugin for Google Gemini models.

Gemini TTS

LiveKit Agents plugin for Gemini TTS.

Gemini Live API

LiveKit Agents plugin for the Gemini Live API.

Google Cloud STT

LiveKit Agents plugin for Google Cloud STT.

Google Cloud TTS

LiveKit Agents plugin for Google Cloud TTS.