Agents Overview

LiveKit Agents is an end-to-end framework for building real-time, multimodal AI "agents" that interact with end-users through voice, video, and data channels. This framework allows you to build an agent using Python.

Diagram showing a high-level view of Agents how they work.

Features

  • LiveKit audio/video transport: Use the same LiveKit API primitives to transport voice and video from the client device to your application server in real-time.
  • Abtractions over common tasks: Tasks such as speech-to-text, text-to-speech, and using LLMs are simplified so you can focus on your core application logic.
  • Extensive and extensible plugins: Prebuilt integrations with OpenAI, DeepGram, Google, and ElevenLabs, and more. You can create a plugin to integrate any other provider.
  • End-to-end dev experience: Compatible with LiveKit server and LiveKit Cloud. Develop locally and deploy to production without changing a single line of code.
  • Orchestration and scaling: Built-in worker service for agent orchestration and load balancing. To scale, just add more workers.
  • Open Source: Like the rest of LiveKit, Agents is Apache 2.0.
  • Edge optimized: When using LiveKit Cloud, your agents leverage LiveKit's global edge network. Agents spin up close to end-users which reduces latency, giving you more time for inference.

Use cases

Agents is designed to give you a lot of flexibility when building server-side, programmable participants. You can use it to create a wide variety of applications including:

  • Voice and video chat with an LLM
  • Real-time voice-to-text transcription
  • Object detection/recognition over real-time video
  • Generated AI-driven avatars
  • Contact center or helpdesk solutions mixing AI and human agents
  • Real-time translation
  • Real-time video filters and transforms

Agent lifecycle

  1. Worker registration: When your agent program runs, it initially connects to LiveKit server and registers itself as a "worker" via persistent WebSocket connection. Once registered, a worker is on standby and waiting for "job" requests to come in.
  2. Agent invocation: When a room is created, LiveKit server notifies registered workers one-by-one about the job. The first worker to accept a job will instantiate your agent and have it join the room. A worker can manage multiple agent instances simultaneously.
  3. Application logic: This is where your application takes over. Your agent can use most LiveKit client features via Python SDK. Agents can also leverage the plugin ecosystem to process or synthesize audio and video data.
  4. Room close: When the last participant (excluding your agent) leaves the room, your agent instance will disconnect from the room, as well.

Diagram describing the functionality of the Agents.