Overview
Virtual avatars add lifelike video output for your voice AI agents. You can integrate a variety of providers to LiveKit Agents with just a few lines of code.
Available providers
The following providers are available. Choose a provider from this list for a step-by-step guide:
Provider | Custom Avatars | Direct Image Upload | Available in |
---|---|---|---|
✓ | — | Python | |
✓ | — | Python | |
✓ | — | Python | |
✓ | ✓ | Python | |
✓ | — | Python | |
✓ | — | Python |
Have another provider in mind? LiveKit is open source and welcomes new plugin contributions.
How it works
The virtual avatar integrations works with the AgentSession
class automatically. The plugin adds a separate participant, the avatar worker, to the room. The agent session sends its audio output to the avatar worker instead of to the room, which the avatar worker uses to publish synchronized audio + video tracks to the room and the end user.
To add a virtual avatar:
- Install the selected plugin and API keys
- Create an
AgentSession
, as in the voice AI quickstart - Create an
AvatarSession
and configure it as necessary - Start the avatar session, passing in the
AgentSession
instance - Start the
AgentSession
with audio output disabled (the audio is sent to the avatar session instead)
Sample code
Here is an example using Hedra Realtime Avatars:
from livekit import agentsfrom livekit.agents import AgentSession, RoomOutputOptionsfrom livekit.plugins import hedraasync def entrypoint(ctx: agents.JobContext):session = AgentSession(# ... stt, llm, tts, etc.)avatar = hedra.AvatarSession(avatar_id="...", # ID of the Hedra avatar to use)# Start the avatar and wait for it to joinawait avatar.start(session, room=ctx.room)# Start your agent session with the userawait session.start(# ... room, agent, room_input_options, etc....)
Avatar workers
To minimize latency, the avatar provider joins the LiveKit room directly as a secondary participant to publish synchronized audio and video to the room. In your frontend app, you must distinguish between the agent — your Python program running the AgentSession
— and the avatar worker.
Loading diagram…
You can identify an avatar worker as a participant of kind agent
with the attribute lk.publish_on_behalf
. Check for these values in your frontend code to associate the worker's audio and video tracks with the agent.
const agent = room.remoteParticipants.find(p => p.kind === Kind.Agent && p.attributes['lk.publish_on_behalf'] === null);const avatarWorker = room.remoteParticipants.find(p => p.kind === Kind.Agent && p.attributes['lk.publish_on_behalf'] === agent.identity);
In React apps, use the useVoiceAssistant hook to get the correct audio and video tracks automatically:
const {agent, // The agent participantaudioTrack, // the worker's audio trackvideoTrack, // the worker's video track} = useVoiceAssistant();
Frontend starter apps
The following frontend starter apps include out-of-the-box support for virtual avatars.
SwiftUI Voice Agent
Next.js Voice Agent
Flutter Voice Agent
React Native Voice Agent
Android Voice Agent
Agents Playground
A virtual workbench to test your multimodal AI agent.