Agents Overview

What is LiveKit Agents?

LiveKit Agents is a framework for building programmable, multimodal AI agents that orchestrate LLMs and other AI models to accomplish tasks. This framework allows you to build agents using Python or Node.js.

Unlike traditional HTTP servers, agents operate as stateful, long-running processes. They connect to the LiveKit network via WebRTC, enabling low-latency, realtime media and data exchange with frontend applications.

Diagram showing a high-level view of how agents work.

The Agents framework overcomes several key limitations of traditional architectures:

  • Multimodal: Agents can exchange voice, video, and text with users.

  • Simpler frontend: Frontend applications use LiveKit’s SDKs to handle the complexities of WebRTC transport, media device management, and audio/video encoding and decoding.

  • Low-latency: The LiveKit Cloud global mesh network connects each user to their nearest edge server, minimizing transport latency.

  • Centralized business logic: Keeping business logic within the agent process allows it to support clients across platforms, including telephony integrations.

  • Stateful: End-user interactions are inherently stateful. Rather than synchronizing client-side state through request/response cycles, agents provide a more intuitive way to manage these interactions.

What you can do with agents

The LiveKit Agents framework is designed to give you flexibility when building server side, programmable participants. You can create multiple frontends that all connect to the same backend agent.

Some great use cases for agents include:

  • AI voice agents: An agent that has natural voice conversations with users.
  • Call center: Answer incoming calls, or make outbound calls with AI agents.
  • Transcription: Realtime voice-to-text transcription.
  • Object detection/recognition: Identify objects over realtime video.
  • AI-driven avatars: Generated avatars using prompts.
  • Translation: Realtime translation.
  • Video manipulation: Realtime video filters and transforms.

How agents connect to LiveKit

Diagram showing a high-level view of how agents work.

When you start running your agent code, it registers itself with a LiveKit server (either self hosted or LiveKit Cloud and runs as a background "worker" process. The worker waits on standby for users to connect. Once a end-user session is initiated (that is, a room is created for the user), an available worker dispatches an agent to the room.

Users connect to a LiveKit room using a frontend application. Each user is a participant in the room and the agent is an AI participant. How the agent interacts with end-user participants depends on the custom code you write.

How to create an agent

To create an agent using the framework, you’ll need to write a Python or Node.js application (your agent) and a frontend for your users:

  • Write the application code for your agent. The configuration, functions, and plugin options are all part of your agent code. You can use plugins included in the framework for LLM, STT, TTS, VAD, and utilities for working with text, or write your own custom plugins. Define the entrypoint function that executes when a connection is made. You can also define optional functions to preprocess connections and set connection thresholds or permissions for the worker process.

    To learn more, see Working with plugins and Worker options.

  • Create a frontend for users to connect to your agent in a LiveKit room. For development and testing, you can use the Agents Playground.

Agents framework features

  • LiveKit audio/video transport: Use the same LiveKit API primitives to transport voice and video from your frontend to your application server in realtime.
  • Abstractions over common tasks: Tasks such as speech-to-text, text-to-speech, and using LLMs are simplified so you can focus on your core application logic.
  • Extensive and extensible plugins: Prebuilt integrations with OpenAI, DeepGram, Google, ElevenLabs, and more. You can create a plugin to integrate any other provider.
  • End-to-end dev experience: Compatible with LiveKit server and LiveKit Cloud. Develop locally and deploy to production without changing a single line of code.
  • Orchestration and scaling: Built-in worker service for agent orchestration and load balancing. To scale, just add more servers.
  • Open Source: Like the rest of LiveKit, the Agents framework is Apache 2.0.
  • Edge optimized: When using LiveKit Cloud, your agents transmit voice and video over LiveKit's global edge network, ensuring minimal latency for users worldwide.