Agents Overview | LiveKit Docs

Diagram showing a high-level view of Agents how they work.

Features

LiveKit audio/video transport: Use the same LiveKit API primitives to transport voice and video from the client device to your application server in realtime.
Abtractions over common tasks: Tasks such as speech-to-text, text-to-speech, and using LLMs are simplified so you can focus on your core application logic.
Extensive and extensible plugins: Prebuilt integrations with OpenAI, DeepGram, Google, ElevenLabs, and more. You can create a plugin to integrate any other provider.
End-to-end dev experience: Compatible with LiveKit server and LiveKit Cloud. Develop locally and deploy to production without changing a single line of code.
Orchestration and scaling: Built-in worker service for agent orchestration and load balancing. To scale, just add more servers.
Open Source: Like the rest of LiveKit, Agents Framework is Apache 2.0.
Edge optimized: When using LiveKit Cloud, your agents transmit voice and video over LiveKit's global edge network, ensuring minimal latency for users worldwide.

Use cases

Agents Framework is designed to give you a lot of flexibility when building server-side, programmable participants. You can use it to create a wide variety of applications including:

Voice assistant with function calling, and interruption support
Realtime voice-to-text transcription
Object detection/recognition over realtime video
Generated AI-driven avatars
Contact center or helpdesk solutions mixing AI and human agents
Realtime translation
Realtime video filters and transforms