LiveKit Agents

Realtime framework for production-grade multimodal and voice AI agents.

Introduction

The Agents framework allows you to add a Python or Node.js program to any LiveKit room as a full realtime participant. The SDK includes a complete set of tools and abstractions that make it easy to feed realtime media and data through an AI pipeline that works with any provider, and to publish realtime results back to the room.

If you want to get your hands on the code right away, follow this quickstart guide. It takes just a few minutes to build your first voice agent.

Voice AI quickstart

Build a simple voice assistant with Python in less than 10 minutes.

Use cases

Some applications for agents include:

  • Multimodal assistant: Talk, text, or screen share with an AI assistant.
  • Telehealth: Bring AI into realtime telemedicine consultations, with or without humans in the loop.
  • Call center: Deploy AI to the front lines of customer service with inbound and outbound call support.
  • Realtime translation: Translate conversations in realtime.
  • NPCs: Add lifelike NPCs backed by language models instead of static scripts.
  • Robotics: Put your robot's brain in the cloud, giving it access to the most powerful models.

The following recipes demonstrate some of these use cases:

Framework overview

Diagram showing framework overview.

Your agent code operates as a stateful, realtime bridge between powerful AI models and your users. While AI models typically run in data centers with reliable connectivity, users often connect from mobile networks with varying quality.

WebRTC ensures smooth communication between agents and users, even over unstable connections. LiveKit WebRTC is used between the frontend and the agent, while the agent communicates with your backend using HTTP and WebSockets. This setup provides the benefits of WebRTC without its typical complexity.

The agents SDK includes components for handling the core challenges of realtime voice AI, such as streaming audio through an STT-LLM-TTS pipeline, reliable turn detection, handling interruptions, and LLM orchestration. It supports plugins for most major AI providers, with more continually added. The framework is fully open source and supported by an active community.

Other framework features include:

  • Voice, video, and text: Build agents that can process realtime input and produce output in any modality.
  • Tool use: Define tools that are compatible with any LLM, and even forward tool calls to your frontend.
  • Multi-agent handoff: Break down complex workflows into simpler tasks.
  • Extensive integrations: Integrate with nearly every AI provider there is for LLMs, STT, TTS, and more.
  • State-of-the-art turn detection: Use the custom turn detection model for lifelike conversation flow.
  • Made for developers: Build your agents in code, not configuration.
  • Production ready: Includes built-in worker orchestration, load balancing, and Kubernetes compatibility.
  • Open source: The framework and entire LiveKit ecosystem are open source under the Apache 2.0 license.

How agents connect to LiveKit

Diagram showing a high-level view of how agents work.

When your agent code starts, it first registers with a LiveKit server (either self hosted or LiveKit Cloud) to run as a "worker" process. The worker waits until it receives a dispatch request. To fulfill this request, the worker boots a "job" subprocess which joins the room. By default, your workers are dispatched to each new room created in your LiveKit project. To learn more about workers, see the Worker lifecycle guide.

After your agent and user join a room, the agent and your frontend app can communicate using LiveKit WebRTC. This enables reliable and fast realtime communication in any network conditions. LiveKit also includes full support for telephony, so the user can join the call from a phone instead of a frontend app.

To learn more about how LiveKit works overall, see the Intro to LiveKit guide.

Getting started

Follow these guides to learn more and get started with LiveKit Agents.

Voice AI quickstart

Build a simple voice assistant with Python in less than 10 minutes.

Was this page helpful?