Deployment and scaling

Deploying stateful workers presents several challenges such as determining how the realtime system notifies workers, selecting an appropriate job queue, and managing autoscaling.

To ease the deployment of your agents, we have integrated worker orchestration as a core component of the framework. This orchestration system is built directly into LiveKit server, and works whether you are self-hosting or using LiveKit Cloud.

From a high-level perspective, there's very little difference between running your agents locally or in a production environments. The primary distinction is in the number of workers you choose to deploy.

Diagram showing multiple workers running multiple instances of your Agent


We recommend building your agent as Docker containers. Here is an example Dockerfile to build your agent.

You can run your workers on either virtual machines (VMs) or containers. The Agents framework is compatible with any platform that supports Python applications, such as Kubernetes or

The primary configuration required is the following environment variables:


Load balancing

Workers do not need an external load balancer. They rely on a job distribution system embedded within LiveKit server. This system is responsible for ensuring that when a job becomes available (e.g. when a new room is created), it is dispatched to only one worker at a time. Should a worker fail to accept the job within a predetermined timeout period, the job will then be routed to another available worker.

In the case of LiveKit Cloud, the system prioritizes available workers at the "edge" or geographically closest to the end-user.