Virtual avatar models overview

Overview

Virtual avatars add lifelike video output for your voice AI agents. You can integrate a variety of providers to LiveKit Agents with just a few lines of code.

Plugins

The following plugins are available. Choose a plugin from this list for a step-by-step guide:

Provider	Python	Node.js
Anam	✓	—
Beyond Presence	✓	—
bitHuman	✓	—
Hedra	✓	—
LiveAvatar	✓	—
Simli	✓	—
Tavus	✓	—

Have another provider in mind? LiveKit is open source and welcomes new plugin contributions.

Usage

The virtual avatar plugins work with the AgentSession class automatically. The plugin adds a separate participant, the avatar worker, to the room. The agent session sends its audio output to the avatar worker instead of to the room, which the avatar worker uses to publish synchronized audio + video tracks to the room and the end user.

To add a virtual avatar:

Install the selected plugin and API keys
Create an AgentSession, as in the voice AI quickstart
Create an AvatarSession and configure it as necessary
Start the avatar session, passing in the AgentSession instance
Start the AgentSession with audio output disabled (the audio is sent to the avatar session instead)

Sample code

Here is an example using Hedra Realtime Avatars:

from livekit import agents
from livekit.agents import AgentServer, AgentSession
from livekit.plugins import hedra

server = AgentServer()

@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
   session = AgentSession(
      # ... stt, llm, tts, etc.
   )

   avatar = hedra.AvatarSession(
      avatar_id="...",  # ID of the Hedra avatar to use
   )

   # Start the avatar and wait for it to join
   await avatar.start(session, room=ctx.room)

   # Start your agent session with the user
   await session.start(
      # ... room, agent, room_options, etc....
   )

Avatar workers

To minimize latency, the avatar provider joins the LiveKit room directly as a secondary participant to publish synchronized audio and video to the room. In your frontend app, you must distinguish between the agent — your Python program running the AgentSession — and the avatar worker.

Loading diagram…

You can identify an avatar worker as a participant of kind agent with the attribute lk.publish_on_behalf. Check for these values in your frontend code to associate the worker's audio and video tracks with the agent.

const agent = room.remoteParticipants.find(
  p => p.kind === Kind.Agent && p.attributes['lk.publish_on_behalf'] === null
);
const avatarWorker = room.remoteParticipants.find(
  p => p.kind === Kind.Agent && p.attributes['lk.publish_on_behalf'] === agent.identity
);

In React apps, use the useVoiceAssistant hook to get the correct audio and video tracks automatically:

const { 
  agent, // The agent participant
  audioTrack, // the worker's audio track
  videoTrack, // the worker's video track
} = useVoiceAssistant();