Anatomy of an Agent

An in-depth guide to understanding the building blocks and interface of Agents.

LiveKit Agents is a framework designed for building real-time, programmable participants that run on servers and connect into LiveKit WebRTC sessions. Agents have the capability to utilize Artificial Intelligence and interact with users using text, voice, or video. This guide explains the core ideas and components that make up Agents. For a step-by-step guide on building an Agent without the deep dive into its inner workings, check out the quickstart guide.

Diagram describing the functionality of the Agents.

Agent lifecycle

The framework turns your program into an "agent" that can join a LiveKit room and interact with other participants. Here's a high-level overview of the lifecycle:

  1. Worker registration: When you run myagent.py start, it connects to LiveKit server and registers itself as a "worker" via a persistent WebSocket connection. Once registered, the app is on standby, waiting for rooms (sessions with end-users) to be created. It exchanges availability and capacity information with LiveKit server automatically, allowing for correct load balancing of incoming requests.
  2. Agent dispatch: When an end-user connects to a room, LiveKit server selects an available worker and sends it information about that session. The first worker to accept that request will instantiate your program and join the room. A worker can host multiple instances of your agent simultaneously, running each in its own process for isolation.
  3. Your program: This is where you take over. Your program can use most LiveKit client features via our Python SDK. Agents can also leverage the plugin ecosystem to process or synthesize voice and video data.
  4. Room close: When the last participant (excluding your agent) leaves the room, the room will close, disconnecting all connected agents.

The worker

In stateful computing, a worker is similar to a web server process in traditional web architecture. The worker acts as the program's main loop, responsible for deploying and monitoring instances of the Agent on child processes. It can handle many Agent instances with minimal overhead, effectively utilizing machine resources.

When a worker experiences high load, it stops taking on new sessions and notifies LiveKit's server.

As you deploy updates to your program, the worker will gracefully drain existing sessions before shutting down, ensuring no sessions are interrupted mid-call.

The interface for creating a worker is through the WorkerOptions class:

opts = WorkerOptions(
# code to run on a new user connection to the server
request_fnc,
# a function that reports the current system load, whether CPU or RAM, etc.
load_fnc,
# the maximum value of load_fnc, above which new processes will not spawn
load_threshold,
# whether the agent can subscribe to tracks, publish data, update metadata, etc.
permissions,
# the type of worker to create, either JT_ROOM or JT_PUBLISHER
worker_type=JobType.JT_ROOM,
)
note:

While it is possible to supply API keys and secrets to the worker directly through WorkerOptions, for security reasons it is recommended to set them as environmental variables that the worker will then read in. The full list of worker options, information about them, and their default values can be found in the source code.

The request_fnc function is executed each time that the agent is dispatched to a session. Within the request_fnc, you may accept or reject the incoming JobRequest. The inner function, called the entry function, has access to information about the current job environment via the JobContext class:

async def entrypoint(job: JobContext):
pass # more on this in the next section
async def request_fnc(req: JobRequest):
await req.accept(
entrypoint,
# which tracks to subscribe to, defaults to SUBSCRIBE_ALL
auto_subscribe=AutoSubscribe.SUBSCRIBE_ALL,
# the agent's name (Participant.name), defaults to ""
name="agent",
# the agent's identity (Participant.identity), defaults to "agent-<jobid>"
identity="identity",
)

Finally, to spin up a worker with the configuration defined using WorkerOptions, call cli.run_app:

if __name__ == "__main__":
cli.run_app(opts)

The Agents worker CLI provides two subcommands: start and dev. The former outputs raw JSON data to stdout, and is recommended for production. dev is recommended to use for development, as it outputs human-friendly colored logs, and supports hot reloading.