Overview
Building a great voice AI app requires careful orchestration of multiple components. In addition, the voice AI end-user experience is particularly sensitive to latency and responsiveness. For these reasons, LiveKit Agents includes a dedicated set of abstractions to make building your own custom voice AI app simple, while giving you full control over the underlying code.
Agent sessions
The AgentSession
is the main orchestrator for your voice AI app. The session is responsible for collecting user input, managing the voice pipeline, invoking the LLM, and sending the output back to the user.
Each session requires at least one Agent
to orchestrate. The agent is responsible for defining the core AI logic - instructions, tools, etc - of your app. The framework supports the design of custom workflows to orchestrate handoff and delegation between multiple agents.
The following example shows how to begin a simple single-agent session:
from livekit.agents.voice import AgentSession, Agent, room_iofrom livekit.plugins import openai, cartesia, deepgram, noise_cancellation, silero, turn_detectorsession = AgentSession(stt=deepgram.STT(),llm=openai.LLM(),tts=cartesia.TTS(),vad=silero.VAD.load(),turn_detection=turn_detector.EOUModel(),)await session.start(room=ctx.room,agent=Agent(instructions="You are a helpful voice AI assistant."),room_input_options=room_io.RoomInputOptions(noise_cancellation=noise_cancellation.BVC(),),)
Voice AI providers
You may choose among many providers of various components of the voice pipeline to suit your needs. The framework has support for both a high-performance STT-LLM-TTS pipeline, as well as lifelike multimodal models. In either case, the framework automatically handles interruptions, transcription forwarding, turn detection, and more.
You may add these components to the AgentSession
, where they act as global defaults within the app, or to each individual Agent
if needed.
TTS
Text-to-speech plugins
STT
Speech-to-text plugins
LLM
Language model plugins
Multimodal
Realtime multimodal APIs
Capabilities
The following guides, in addition to others in this section, cover the core capabilities of the AgentSession
and how to leverage them in your app.