Google AI and LiveKit | LiveKit Docs

Play with the Gemini Multimodal Live API in this LiveKit-powered playground

Google AI ecosystem support

Google AI provides some of the most powerful AI models and services today, which integrate into LiveKit Agents in the following ways:

Gemini: A family of general purpose high-performance LLMs.
Google Cloud STT and TTS: Affordable, production-grade models for transcription and speech synthesis.
Gemini Multimodal Live API: A speech-to-speech realtime model with live video input.

LiveKit Agents supports Google AI through the Gemini API and Vertex AI.

Getting started

Use the Voice AI quickstart to build a voice AI app with Gemini. Select an STT-LLM-TTS pipeline model type and add the following components to build on Gemini.

Voice AI quickstart

Build your first voice AI app with Google Gemini.

Install the Google plugin:

pip install "livekit-agents[google]~=1.0rc"

Add your Google API key to your .env. file:

GOOGLE_API_KEY=<your-google-api-key>

Use the Google LLM component to initialize your AgentSession:

from livekit.plugins import google

# ...

# in your entrypoint function
session = AgentSession(
    llm=google.LLM(
        model="gemini-2.0-flash",
    ),
    # ... stt, tts,vad, turn_detection, etc.
)

LiveKit Agents overview

LiveKit Agents is an open-source framework for building realtime AI apps using WebRTC transport to end-user devices and WebSockets or HTTPS for backend services.

Agent workflows: Build complex voice AI apps with discrete stages and handoffs.
Telephony: Inbound and outbound calling using SIP trunks.
Frontend SDKs: Full-featured SDKs and UI components for JavaScript, Swift, Kotlin, Flutter, React Native, and Unity.
Python and Node.js: Build voice AI apps in Python or Node.js.
Dispatch and load balancing: Built-in support for request distribution and load balancing.
LiveKit Cloud: Fully-managed LiveKit server with global scale and low latency (you can also self-host).

What is WebRTC?

WebRTC provides significant advantages over other options for building realtime applications such as websockets.

Optimized for media: Purpose-built for audio and video with advanced codecs and compression algorithms.
Network resilient: Performs reliably even in challenging network conditions due to UDP, adaptive bitrate, and more.
Broad compatibility: Natively supported in all modern browsers.

LiveKit handles all of the complexity of running production-grade WebRTC infrastructure while extending support to mobile apps, backends, and telephony.

Google plugin documentation

The following links provide more information on each available Google component in LiveKit Agents.

Gemini LLM

LiveKit Agents docs for Google Gemini models.

Gemini Multimodal Live API

LiveKit Agents docs for Google Gemini Multimodal Live API.

Google Cloud STT

LiveKit Agents docs for Google Cloud STT.

Google Cloud TTS

LiveKit Agents docs for Google Cloud TTS.