Agents Overview

LiveKit Agents is an end-to-end framework for building realtime, multimodal AI agents that interact with end users through voice, video, and data channels. This framework allows you to build an agent using Python.

What you can do with agents

The LiveKit Agents framework is designed to give you flexibility when building server-side, programmable participants. You can create multiple frontend clients that connect to the same backend agent.

The following lists some examples of agent use cases:

  • Voice assistant: An agent that has conversations with users.
  • Call center: Answer incoming calls and respond to user questions using both AI and human agents.
  • Transcription: Realtime voice-to-text transcription.
  • Object detection/recognition: Identify objects over realtime video.
  • AI-driven avatars: Generated avatars using prompts.
  • Translation: Realtime translation.
  • Video manipulation: Realtime video filters and transforms.

What is a voice assistant

The most common application of LiveKit Agents are AI voice assistants that use LLMs to have natural conversations with users. You can create a voice assistant to interact with users in a LiveKit room, respond to customer calls, or make outbound calls. For example, a voice assistant can answer your calls and respond to general customer inquiries.

To learn more, see AI Voice Agents.

How LiveKit agents work

Diagram showing a high-level view of how agents work.

The LiveKit Agents framework enables you to create a realtime AI agent using prebuilt plugins that cover common tasks like converting speech to text, text to speech, and running inference on a generative AI model.

When you start running your agent code, it registers itself with a LiveKit server (either self hosted or LiveKit Cloud and runs as a background "worker" process. The worker waits on standby for users to connect. Once a end-user session is initiated (that is, a room is created for the user), an available worker dispatches an agent to the room.

Users connect to a LiveKit room using a frontend application. Each user is a participant in the room and the agent is an AI participant. How the agent interacts with end-user participants depends on the custom code you write.

How to create an agent

To create an agent using the framework, you’ll need to write a Python application (your agent) and a frontend for your users:

  • Write the application code for your agent. The configuration, functions, and plugin options are all part of your agent code. You can use plugins included in the framework for LLM, STT, TTS, VAD, and utilities for working with text, or write your own custom plugins. Define the entrypoint function that executes when a connection is made. You can also define optional functions to preprocess connections and set connection thresholds or permissions for the worker process.

    To learn more, see Working with plugins and Worker options.

  • Create a frontend client for users to connect to your agent in a LiveKit room. For development and testing, you can use the Agents Playground.

Agents framework features

  • LiveKit audio/video transport: Use the same LiveKit API primitives to transport voice and video from the client device to your application server in realtime.
  • Abstractions over common tasks: Tasks such as speech-to-text, text-to-speech, and using LLMs are simplified so you can focus on your core application logic.
  • Extensive and extensible plugins: Prebuilt integrations with OpenAI, DeepGram, Google, ElevenLabs, and more. You can create a plugin to integrate any other provider.
  • End-to-end dev experience: Compatible with LiveKit server and LiveKit Cloud. Develop locally and deploy to production without changing a single line of code.
  • Orchestration and scaling: Built-in worker service for agent orchestration and load balancing. To scale, just add more servers.
  • Open Source: Like the rest of LiveKit, the Agents framework is Apache 2.0.
  • Edge optimized: When using LiveKit Cloud, your agents transmit voice and video over LiveKit's global edge network, ensuring minimal latency for users worldwide.