AI Voice Assistant Quickstart

This quickstart tutorial walks you through the steps to build a conversational AI application using Python and NextJS. It uses LiveKit's Agents Framework and React Components Library to create an AI-powered voice assistant that can engage in realtime conversations with users. By the end, you will have a basic voice assistant application that you can run and interact with.

Note

If you're interested in using the OpenAI Realtime API, see the Speech-to-speech quickstart.

Prerequisites

Note

By default, the example agent uses Deepgram for STT and OpenAI for TTS and LLM. However, you aren't required to use these providers.

Steps

The following steps take you through the process of creating a voice assistant using the LiveKit CLI and some minimal templates.

Setup a LiveKit account and install the CLI

Create an account or sign in to your LiveKit Cloud account.
Install the LiveKit CLI and authenticate using lk cloud auth.

Bootstrap an agent from template

Clone the starter template for a simple Python voice agent:
```
lk app create --template voice-pipeline-agent-python
```
Enter your OpenAI API Key and Deepgram API Key when prompted. If you aren't using Deepgram and OpenAI, see Customizing plugins.
Install dependencies and start your agent:
```
cd <agent_dir>

python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements.txt
python3 agent.py dev
```
You can edit the agent.py file to customize the system prompt and other aspects of your agent.

Bootstrap a frontend from template

Clone the Voice Assistant Frontend Next.js app starter template using the CLI:
```
lk app create --template voice-assistant-frontend
```
Install dependencies and start your frontend application:
```
cd <frontend_dir>

pnpm install
pnpm dev
```

Launch your app and talk to your agent

Visit your locally-running application (by default, http://localhost:3000).
Select Connect and start a conversation with your agent.

Customizing plugins

You can change the VAD, STT, TTS, and LLM plugins your agent uses by editing the agents.py file. By default, the sandbox voice assistant is configured to use Silero for VAD, Deepgram for STT, and OpenAI for TTS and LLM using the gpt-4o-mini model:

assistant = VoiceAssistant(
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
    )

You can modify your agent to use different providers. For example, to use Cartesia for TTS, use the following steps:

Edit file agent.py and update the imported plugins list to include cartesia:
```
from livekit.plugins import cartesia, deepgram, openai, silero
```
Update the tts plugin for your assistant in file agent.py:
```
tts=cartesia.TTS(),
```
Update the .env.local file to include your Cartesia API key by adding a CARTESIA_API_KEY environment variable:
```
CARTESIA_API_KEY="<cartesia_api_key>"
```
Install the plugin locally:
```
pip install livekit-plugins-cartesia
```
Start your agent:
```
python3 agent.py dev
```

Next steps

For a list of additional plugins you can use, see Integration guides for LiveKit Agents.
Let your friends and colleagues talk to your agent by connecting it to a LiveKit Sandbox.
Create an agent that accepts incoming calls using SIP.
Create an agent that makes outbound calls using SIP.