Agents Overview

LiveKit Agents is an end-to-end framework for building realtime, multimodal AI "agents" that interact with end-users through voice, video, and data channels. This framework allows you to build an agent using Python.

On this page

Diagram showing a high-level view of Agents how they work.


  • LiveKit audio/video transport: Use the same LiveKit API primitives to transport voice and video from the client device to your application server in realtime.
  • Abtractions over common tasks: Tasks such as speech-to-text, text-to-speech, and using LLMs are simplified so you can focus on your core application logic.
  • Extensive and extensible plugins: Prebuilt integrations with OpenAI, DeepGram, Google, ElevenLabs, and more. You can create a plugin to integrate any other provider.
  • End-to-end dev experience: Compatible with LiveKit server and LiveKit Cloud. Develop locally and deploy to production without changing a single line of code.
  • Orchestration and scaling: Built-in worker service for agent orchestration and load balancing. To scale, just add more servers.
  • Open Source: Like the rest of LiveKit, Agents Framework is Apache 2.0.
  • Edge optimized: When using LiveKit Cloud, your agents transmit voice and video over LiveKit's global edge network, ensuring minimal latency for users worldwide.

Use cases

Agents Framework is designed to give you a lot of flexibility when building server-side, programmable participants. You can use it to create a wide variety of applications including:

  • Voice assistant with function calling, and interruption support
  • Realtime voice-to-text transcription
  • Object detection/recognition over realtime video
  • Generated AI-driven avatars
  • Contact center or helpdesk solutions mixing AI and human agents
  • Realtime translation
  • Realtime video filters and transforms