Google AI and LiveKit | LiveKit Documentation

Play with the Gemini Live API in this LiveKit-powered playground

Overview

This guide walks you through building a voice AI assistant with Google Gemini and LiveKit Agents. In less than 10 minutes, you have a voice assistant that you can speak to in your terminal, browser, or on the phone.

LiveKit Agents overview

LiveKit Agents is an open source framework for building realtime AI apps in Python and Node.js. It supports complex voice AI workflows with multiple agents and discrete processing steps, and includes built-in load balancing.

LiveKit provides SIP support for telephony integration and full-featured frontend SDKs in multiple languages. It uses WebRTC transport for end-user devices, enabling high-quality, low-latency realtime experiences. To learn more, see LiveKit Agents.

Google AI ecosystem support

Google AI provides some of the most powerful AI models and services today, which integrate into LiveKit Agents in the following ways:

Gemini Live API: A speech-to-speech realtime model with live video input.
Gemini: A family of general purpose high-performance LLMs.
Gemini TTS: A speech synthesis model that generates customizable speech from text.
Google Cloud STT and TTS: Affordable, production-grade models for transcription and speech synthesis.

LiveKit Agents supports Google AI through the Gemini API and Vertex AI .

Google Cloud STT and TTS require separate credentials

Gemini does not provide an STT model. Speech-to-text uses Google Cloud STT (Chirp), while the standard TTS uses Google Cloud TTS. These are separate Google Cloud services and require GOOGLE_APPLICATION_CREDENTIALS (a service account JSON file), not the GOOGLE_API_KEY used by Gemini LLM, Gemini TTS, or the Live API.

Coding agent support

LiveKit is built for coding agents like Gemini CLI , Google Antigravity , and Claude Code . These agents can build agents and frontends with the LiveKit SDKs and manage resources with the LiveKit CLI. Give your agent LiveKit expertise using the LiveKit CLI or Docs MCP server. For more information, see the coding agents guide.

Requirements

The following sections describe the minimum requirements to get started:

LiveKit Agents requires Python >= 3.10.
This guide uses the uv package manager.

LiveKit Cloud

This guide assumes you have signed up for a free LiveKit Cloud account. LiveKit Cloud includes agent deployment, model inference, and realtime media transport. Create a free project and use the API keys in the following steps to get started.

While this guide assumes LiveKit Cloud, the instructions can be adapted for self-hosting the open-source LiveKit server instead. For self-hosting in production, set up a custom deployment environment.

LiveKit CLI

Use the LiveKit CLI to manage LiveKit API keys and deploy your agent to LiveKit Cloud.

Install the LiveKit CLI:
Install the LiveKit CLI with Homebrew :
```
brew install livekit-cli
```
```
curl -sSL https://get.livekit.io/cli | bash
```
Tip
You can also download the latest precompiled binaries here .
```
winget install LiveKit.LiveKitCLI
```
Tip
You can also download the latest precompiled binaries here .
This repo uses Git LFS for embedded video resources. Please ensure git-lfs is installed on your machine before proceeding.
```
git clone github.com/livekit/livekit-cli
make install
```
Link your LiveKit Cloud project to the CLI:
```
lk cloud auth
```
This opens a browser window to authenticate and link your project to the CLI.

AI models

Voice agents require one or more AI models to provide understanding, intelligence, and speech. LiveKit Agents supports both high-performance STT-LLM-TTS voice pipelines constructed from multiple specialized models, as well as realtime models with direct speech-to-speech capabilities.

The rest of this guide presents two options for getting started with Gemini:

Use the Gemini Live API for an expressive and lifelike voice experience with a single realtime model. This is the simplest way to get started with Gemini.

Model	Required Key
Gemini Live API	`GOOGLE_API_KEY`

String together three specialized Google services into a high-performance voice pipeline.

Component	Model	Required Key
STT	Google Cloud STT (Chirp)	`GOOGLE_APPLICATION_CREDENTIALS`
LLM	Gemini 2.0 Flash	`GOOGLE_API_KEY`
TTS	Google Cloud TTS	`GOOGLE_APPLICATION_CREDENTIALS`

Setup

Use the instructions in the following sections to set up your new project.

Project initialization

Create a new project for the voice agent.

Run the following commands to use uv to create a new project ready to use for your new voice agent:

uv init livekit-gemini-agent --bare
cd livekit-gemini-agent

Install packages

Install the following packages to build a voice AI agent with the Gemini Live API, noise cancellation, and turn detection:

uv add \
  "livekit-agents[silero,google]~=1.5" \
  "livekit-plugins-ai-coustics" \
  "python-dotenv"

Install the following packages to build a complete voice AI agent with Gemini, noise cancellation, and turn detection:

uv add \
  "livekit-agents[silero,turn-detector,google]~=1.5" \
  "livekit-plugins-ai-coustics" \
  "python-dotenv"

Environment variables

Run the following command to load your LiveKit Cloud API keys into a .env.local file:

lk app env -w

Add your Google API key from the Google AI Studio :

LIVEKIT_API_KEY=<YOUR_API_KEY>
LIVEKIT_API_SECRET=<YOUR_API_SECRET>
LIVEKIT_URL=<your LiveKit server URL>
GOOGLE_API_KEY=<Your Google API Key>

Add your Google API key from the Google AI Studio . For Google Cloud STT, you also need to set up Google Cloud credentials:

LIVEKIT_API_KEY=<YOUR_API_KEY>
LIVEKIT_API_SECRET=<YOUR_API_SECRET>
LIVEKIT_URL=<your LiveKit server URL>
GOOGLE_API_KEY=<Your Google API Key>
GOOGLE_APPLICATION_CREDENTIALS=<Path to your Google Cloud service account JSON file>

Google Cloud credentials

Google Cloud STT requires a Google Cloud project with the Speech-to-Text API enabled. Create a service account key and download the JSON file. To learn more, see Google Cloud authentication .

Agent code

Create a file with your agent code.

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, room_io
from livekit.plugins import (
    ai_coustics,
    google,
    silero,
)

load_dotenv(".env.local")


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant powered by Gemini.")


server = AgentServer()


@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        llm=google.realtime.RealtimeModel(
            voice="Puck",
        ),
        vad=silero.VAD.load(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=ai_coustics.audio_enhancement(model=ai_coustics.EnhancerModel.QUAIL_VF_S),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )


if __name__ == "__main__":
    agents.cli.run_app(server)

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, room_io, TurnHandlingOptions
from livekit.plugins import ai_coustics, google, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv(".env.local")


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant powered by Google.
            You eagerly assist users with their questions by providing information from your extensive knowledge.
            Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
            You are curious, friendly, and have a sense of humor.""",
        )


server = AgentServer()


@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=google.STT(
            model="chirp",
        ),
        llm=google.LLM(
            model="gemini-2.5-flash",
        ),
        tts=google.TTS(
            gender="female",
            voice_name="en-US-Standard-H",
        ),
        vad=silero.VAD.load(),
        turn_handling=TurnHandlingOptions(
            turn_detection=MultilingualModel(),
        ),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=ai_coustics.audio_enhancement(model=ai_coustics.EnhancerModel.QUAIL_VF_S),
            ),
        ),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance.",
    )


if __name__ == "__main__":
    agents.cli.run_app(server)

Download model files

If you're using the turn-detector plugin, you first need to download the model files:

uv run agent.py download-files

Speak to your agent

Start your agent in console mode to run inside your terminal:

uv run agent.py console

Your agent speaks to you in the terminal, and you can speak to it as well.

Connect to Agent Console

Start your agent in dev mode to connect it to LiveKit and make it available from anywhere on the internet:

uv run agent.py dev

Use the Agent Console to interact with and debug your agent in real-time. Note that you'll need to set the Agent name, which should be my-agent for this quickstart.

Deploy to LiveKit Cloud

From the root of your project, run the following command with the LiveKit CLI. Ensure you have linked your LiveKit Cloud project.

lk agent create

The CLI creates Dockerfile, .dockerignore, and livekit.toml files in your current directory, then registers your agent with your LiveKit Cloud project and deploys it.

After the deployment completes, you can access your agent in the Agent Console, or continue to use the terminal console mode as you build and test your agent locally.

Additional resources

The following links provide more information on each available Google component in LiveKit Agents.

Gemini Vision Assistant

Build a vision-aware voice assistant with Gemini Live.

Gemini LLM

LiveKit Agents plugin for Google Gemini models.

Gemini TTS

LiveKit Agents plugin for Gemini TTS.

Gemini Live API

LiveKit Agents plugin for the Gemini Live API.

Google Cloud STT

LiveKit Agents plugin for Google Cloud STT.

Google Cloud TTS

LiveKit Agents plugin for Google Cloud TTS.