Voice AI quickstart

Build a simple voice assistant with Python in less than 10 minutes.

Overview

This guide walks you through the setup of your very first voice assistant using LiveKit Agents for Python. In less than 10 minutes, you'll have a voice assistant that you can speak to in your terminal, browser, telephone, or native app.

Requirements

The following sections describe the minimum requirements to get started with LiveKit Agents.

Python

LiveKit Agents requires Python 3.9 or later.

Looking for Node.js?

The Node.js beta is still in development and has not yet reached v1.0. See the v0.x documentation for Node.js reference and join the LiveKit Community Slack to be the first to know when the next release is available.

LiveKit server

You need a LiveKit server instance to transport realtime media between user and agent. The easiest way to get started is with a free LiveKit Cloud account. Create a project and use the API keys in the following steps. You may also self-host LiveKit if you prefer.

AI providers

LiveKit Agents integrates with most AI model providers and supports both high-performance STT-LLM-TTS voice pipelines, as well as lifelike multimodal models.

The rest of this guide assumes you use one of the following two starter packs, which provide the best combination of value, features, and ease of setup.

Your agent strings together three specialized providers into a high-performance voice pipeline. You need accounts and API keys for each.

Loading diagram…

ComponentProviderPluginRequired Key
STTDeepgramlivekit-plugins-deepgramDEEPGRAM_API_KEY
LLMOpenAIlivekit-plugins-openaiOPENAI_API_KEY
TTSCartesialivekit-plugins-cartesiaCARTESIA_API_KEY

Setup

Use the instructions in the following sections to set up your new project.

Packages

In addition to LiveKit Agents and plugins for the pipeline type you've chosen, install plugins for noise cancellation, VAD, and turn detection to make your voice AI app best-in-class.

Noise cancellation

This example integrates LiveKit Cloud enhanced noise cancellation. If you're not using LiveKit Cloud, omit the plugin and the noise_cancellation parameter from the following code.

Release candidate

LiveKit Agents v1.0 is currently available as a release candidate. The following commands install the latest pre-release packages.

pip install \
"livekit-agents[openai,silero,deepgram,cartesia,turn-detector]~=1.0rc" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"

Environment variables

Create a file named .env and add your LiveKit credentials along with the necessary API keys for your AI providers.

DEEPGRAM_API_KEY=<Your Deepgram API Key>
OPENAI_API_KEY=<Your OpenAI API Key>
CARTESIA_API_KEY=<Your Cartesia API Key>
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your LiveKit server URL>

Agent code

Create a file named main.py containing the following code for your first voice agent.

from dotenv import load_dotenv
from livekit import agents
from livekit.agents.voice import AgentSession, Agent, room_io
from livekit.plugins import (
openai,
cartesia,
deepgram,
noise_cancellation,
silero,
turn_detector,
)
load_dotenv()
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(instructions="You are a helpful voice AI assistant.")
async def entrypoint(ctx: agents.JobContext):
await ctx.connect()
session = AgentSession(
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=cartesia.TTS(),
vad=silero.VAD.load(),
turn_detection=turn_detector.EOUModel(),
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_input_options=room_io.RoomInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
)
# Instruct the agent to speak first
await session.generate_reply()
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Download model files

To use the silero and turn-detector plugins, you first need to download the model files. It's recommended to do this separately before the first run:

python main.py download-files

Speak to your agent

Start your agent in console mode to run inside your terminal:

python main.py console

Your agent speaks to you in the terminal, and you can speak to it as well.

Connect to playground

Start your agent in dev mode to connect it to LiveKit and make it available from anywhere on the internet:

python main.py dev

Use the Agents playground to speak with your agent and explore its full range of multimodal capabilities.

Congratulations, your agent is up and running. Continue to use the playground or the console mode as you build and test your agent.

Agent CLI modes

In the console mode, the agent runs locally and is only available within your terminal.

Run your agent in dev (development / debug) or start (production) mode to connect to LiveKit and join rooms.

Next steps

Follow these guides bring your voice AI app to life in the real world.

Was this page helpful?