Voice AI quickstart

Build a simple voice assistant with Python in less than 10 minutes.

Overview

This guide walks you through the setup of your very first voice assistant using LiveKit Agents for Python. In less than 10 minutes, you'll have a voice assistant that you can speak to in your terminal, browser, telephone, or native app.

Requirements

The following sections describe the minimum requirements to get started with LiveKit Agents.

Python

LiveKit Agents requires Python 3.9 or later.

Looking for Node.js?

The Node.js beta is still in development and has not yet reached v1.0. See the v0.x documentation for Node.js reference and join the LiveKit Community Slack to be the first to know when the next release is available.

LiveKit server

You need a LiveKit server instance to transport realtime media between user and agent. The easiest way to get started is with a free LiveKit Cloud account. Create a project and use the API keys in the following steps. You may also self-host LiveKit if you prefer.

AI providers

LiveKit Agents integrates with most AI model providers and supports both high-performance STT-LLM-TTS voice pipelines, as well as lifelike multimodal models.

The rest of this guide assumes you use one of the following two starter packs, which provide the best combination of value, features, and ease of setup.

Your agent strings together three specialized providers into a high-performance voice pipeline. You need accounts and API keys for each.

Diagram showing STT-LLM-TTS pipeline.
ComponentProviderRequired KeyAlternatives
STTDeepgramDEEPGRAM_API_KEYSTT integrations
LLMOpenAIOPENAI_API_KEYLLM integrations
TTSCartesiaCARTESIA_API_KEYTTS integrations

Setup

Use the instructions in the following sections to set up your new project.

Packages

Noise cancellation

This example integrates LiveKit Cloud enhanced background voice/noise cancellation, powered by Krisp.

The noise cancellation plugin is currently unavailable for Windows platforms.

If you're not using LiveKit Cloud, omit the plugin and the noise_cancellation parameter from the following code.

Install the following packages to build a complete voice AI agent with your STT-LLM-TTS pipeline, noise cancellation, and turn detection:

pip install \
"livekit-agents[deepgram,openai,cartesia,silero,turn-detector]~=1.0" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"

Environment variables

Create a file named .env and add your LiveKit credentials along with the necessary API keys for your AI providers.

DEEPGRAM_API_KEY=<Your Deepgram API Key>
OPENAI_API_KEY=<Your OpenAI API Key>
CARTESIA_API_KEY=<Your Cartesia API Key>
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your LiveKit server URL>

Agent code

Create a file named main.py containing the following code for your first voice agent.

from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
openai,
cartesia,
deepgram,
noise_cancellation,
silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel
load_dotenv()
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(instructions="You are a helpful voice AI assistant.")
async def entrypoint(ctx: agents.JobContext):
await ctx.connect()
session = AgentSession(
stt=deepgram.STT(model="nova-3", language="multi"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=cartesia.TTS(),
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
)
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Download model files

To use the turn-detector, silero, or noise-cancellation plugins, you first need to download the model files:

python main.py download-files

Speak to your agent

Start your agent in console mode to run inside your terminal:

python main.py console

Your agent speaks to you in the terminal, and you can speak to it as well.

Screenshot of the CLI console mode.

Connect to playground

Start your agent in dev mode to connect it to LiveKit and make it available from anywhere on the internet:

python main.py dev

Use the Agents playground to speak with your agent and explore its full range of multimodal capabilities.

Congratulations, your agent is up and running. Continue to use the playground or the console mode as you build and test your agent.

Agent CLI modes

In the console mode, the agent runs locally and is only available within your terminal.

Run your agent in dev (development / debug) or start (production) mode to connect to LiveKit and join rooms.

Next steps

Follow these guides bring your voice AI app to life in the real world.

Recipes

A comprehensive collection of examples, guides, and recipes for LiveKit Agents.