Building voice agents

In-depth guide to voice AI with LiveKit Agents.

Overview

Building a great voice AI app requires careful orchestration of multiple components. In addition, the voice AI end-user experience is particularly sensitive to latency and responsiveness. For these reasons, LiveKit Agents includes a dedicated set of abstractions to make building your own custom voice AI app simple, while giving you full control over the underlying code.

Agent sessions

The AgentSession is the main orchestrator for your voice AI app. The session is responsible for collecting user input, managing the voice pipeline, invoking the LLM, and sending the output back to the user.

Each session requires at least one Agent to orchestrate. The agent is responsible for defining the core AI logic - instructions, tools, etc - of your app. The framework supports the design of custom workflows to orchestrate handoff and delegation between multiple agents.

The following example shows how to begin a simple single-agent session:

from livekit.agents.voice import AgentSession, Agent, room_io
from livekit.plugins import openai, cartesia, deepgram, noise_cancellation, silero, turn_detector
session = AgentSession(
stt=deepgram.STT(),
llm=openai.LLM(),
tts=cartesia.TTS(),
vad=silero.VAD.load(),
turn_detection=turn_detector.EOUModel(),
)
await session.start(
room=ctx.room,
agent=Agent(instructions="You are a helpful voice AI assistant."),
room_input_options=room_io.RoomInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
)

Voice AI providers

You may choose among many providers of various components of the voice pipeline to suit your needs. The framework has support for both a high-performance STT-LLM-TTS pipeline, as well as lifelike multimodal models. In either case, the framework automatically handles interruptions, transcription forwarding, turn detection, and more.

You may add these components to the AgentSession, where they act as global defaults within the app, or to each individual Agent if needed.

Capabilities

The following guides, in addition to others in this section, cover the core capabilities of the AgentSession and how to leverage them in your app.