Quotas and limits | LiveKit Documentation

Overview

LiveKit Cloud applies quotas to each project — limits on concurrency, request rates, session durations, and other per-project operations. When a quota is reached, new operations of that kind fail until conditions allow them again (closing an existing session, waiting for the rate limit window to reset, or deleting an existing item). Most features are also metered: you're billed per unit of consumption, with an included monthly allowance on each plan. Higher-tier plans receive higher limits and larger allowances.

Limits serve different purposes across the platform. Concurrency and rate limits keep the platform stable under load. Time and size limits cap individual sessions, recordings, and uploads so a single workload can't consume the platform indefinitely. Per-plan limits on features like custom voices and observability retention reflect what's included with each plan tier.

For projects on the free Build plan, the included allowance acts as a hard cap — after you exceed it, new requests fail rather than incurring overage charges. Free projects also share allowances and limits across all of a user's free projects; creating additional projects doesn't increase the total. Included allowances reset on the first day of each calendar month and don't roll over.

You can view the current limits on your project at any time in the LiveKit Cloud dashboard by navigating to Settings and selecting the Project tab. Refer to the latest pricing page for current numbers on each plan. Enterprise customers can negotiate an Enterprise plan with significantly higher limits in exchange for an annual commitment.

Workspace quotas (Enterprise)

On the Enterprise plan, quotas are managed at the workspace level.

WebRTC transport

LiveKit Cloud transports realtime media between participants using WebRTC. The Ingress and Egress services let you push external streams into a room or record and forward streams out. The following limits apply to these services.

Concurrency limits

The following table shows the default concurrency limits on the Build plan.

Type	Definition	Free limit
Participant	Total number of connected agents and end-users across all rooms.	100 participants
Ingress request	An active session of the Ingress service transcoding an incoming stream.	2 requests
Egress request	An active session of the Egress service recording a composite stream or single track.	2 requests

Media subscription limits

Each active participant can only subscribe to a limited number of individual media tracks at once. The following table shows the default limits for all plan types.

Track type	Limit
Video	100
Audio	100

For high volume video use cases, consider using pagination and selective subscriptions to keep the number of subscriptions within these limits.

Egress time limits

The LiveKit Cloud Egress service has time limits, which vary based on the output type. The following table shows the default limits for all plan types.

Egress output	Time limit
File output (MP4, OGG, WebM)	3 hours
HLS segments	12 hours
HLS/RTMP streaming	12 hours
Raw single stream (track)	12 hours

When these time limits are reached, any active egress ends with a LIMIT_REACHED status. The recorded data, however, is still sent to your configured output destinations.

You can listen for this status change using the egress_ended webhook.

LiveKit Inference

LiveKit Inference serves STT, TTS, and LLM models. STT and TTS run over persistent WebSocket connections, while LLM is exposed as a stateless HTTP API. Each model type has its own kind of limit.

STT and TTS concurrency limits

STT and TTS connections to LiveKit Inference each have their own concurrency limit. The following table shows the defaults on the Build plan.

Type	Definition	Free limit
LiveKit Inference STT	Active STT connections to LiveKit Inference models.	5 connections
LiveKit Inference TTS	Active TTS connections to LiveKit Inference models.	5 connections

LLM rate limits

Because applications vary in their request rate and token usage, LLM usage has two rate limits instead of a single concurrency cap: requests per minute (RPM) and tokens per minute (TPM). Both are enforced in a sliding 60-second window — if either is reached, new requests fail. The goal is to support the same effective concurrency as STT and TTS.

The following table shows the default rate limits on the Build plan. For rate limits on paid plans, refer to the latest pricing .

Limit type	Definition	Free limit
LLM requests	Individual requests to a LiveKit Inference LLM model, including tool responses and preemptive generations.	100 requests per minute
LLM tokens	Input and output tokens used in requests to a LiveKit Inference LLM model, including tool responses and preemptive generations.	600,000 tokens per minute

Custom voice limits

Custom voices availability and limits vary by plan. The following table shows which operations are available on each plan and the maximum number of voice clones per project.

Operation	Build (free)	Ship	Scale	Enterprise
View voices	Yes	Yes	Yes	Yes
Delete voice	Yes	Yes	Yes	Yes
Create clone	No	Yes (limit 20)	Yes (limit 50)	Yes (limit 50)
Re-clone to provider	No	Yes	Yes	Yes
Use voice clone (TTS)	No	Yes	Yes	Yes

Note

Voice clone limits are per project. Each clone counts toward the limit regardless of how many providers it is cloned to. When the limit is reached, you must delete an existing clone before creating a new one.

View and delete operations are available on the Build plan so that users who downgrade from a paid plan can still manage their existing voices.

Usage of voice clones for TTS synthesis is billed at standard LiveKit Inference TTS rates , the same as any other voice.

Agent deployment

Agents deployed to LiveKit Cloud are subject to concurrency limits, a build-size limit on each deployment, free-tier allowances for the adaptive interruption handling and audio turn detection models, and cold-start delays on the Build plan.

Agent session concurrency

An agent session is an actively connected agent running on LiveKit Cloud. Build plan projects can run up to 5 agent sessions concurrently.

Build context size

The build context uploaded during lk agent deploy has a maximum size of 1 GB. Use .dockerignore or .gitignore to exclude unnecessary files. See Builds and Dockerfiles for more information.

Adaptive interruption handling

Usage of the adaptive interruption handling model is free for all agents deployed to LiveKit Cloud. For local development and testing, every plan includes 40,000 free requests per month. Each 100 ms of overlapping speech audio is counted as one request.

Audio turn detection

Usage of the full turn detector model (v1) is free for all agents deployed to LiveKit Cloud. For local development and testing, every plan includes 7,500 free requests per month. When the allowance is exhausted, the session falls back to the local v1-mini model automatically.

Agent cold starts

Projects on the Build plan might have their deployed agents shut down after all active sessions end. The agent automatically starts again when a new session begins. This can cause up to 10 to 20 seconds of delay before the agent joins the room.

Agent observability

Agents continuously stream observability events while connected to a session. Audio recordings are collected locally, and uploaded after the session ends.

Event and audio rate limits

The following table shows the limits placed on the volume of observability events and recordings produced across all sessions, per minute.

Limit type	Definition	Free limit
Agent observability events	Individual transcripts, observations, and logs streamed to LiveKit Cloud.	1,000 events per minute
Agent audio recordings	Audio session recordings collected locally and uploaded to LiveKit Cloud.	5 minutes of audio per minute

Retention window

In addition to the rate limits above, all agent observability data is subject to a 30-day retention window. See the agent observability guide for more information.

API rate limits

All projects have a Server API rate limit of 1,000 requests per minute. This applies to requests such as to the RoomService or EgressService, not to SDK methods like joining a room or sending data packets. Requests to LiveKit Inference have their own rate limits.

Requesting limit increases

Customers on the Scale plan can request an increase for specific limits in their project settings .

Metered resources

Most features of LiveKit Cloud are metered — you're billed by the unit of resource you consume. Every plan ships with an included monthly allowance for each metered resource. On paid plans, usage beyond the included allowance is billed incrementally at the plan's published rate. On the free Build plan, the included allowance is a hard cap and new requests fail after it's exceeded.

The following table defines each metered resource and shows the included allowance on the free Build plan.

Resource	Definition	Free allowance
Agent session minutes	Active time that an agent deployed to LiveKit Cloud is connected to a WebRTC or Telephony session.	1,000 minutes
Agent observability events	Individual transcripts, observations, and logs in agent observability.	100,000 events
Agent audio recordings	Audio session recordings for agent observability.	1,000 minutes
LiveKit Inference	Aggregated usage for all LiveKit Inference models, at current pricing .	$2.50
US local number rental	Monthly rental for a LiveKit Phone Number.	1 number
US local inbound minutes	Inbound minutes to a US local number.	50 minutes
US toll-free number rental	Monthly rental for a toll-free LiveKit Phone Number.	0 numbers
US toll-free inbound minutes	Inbound minutes to a US toll-free number.	0 minutes
Third-party SIP minutes	Time that a single caller is connected to LiveKit Cloud via a third-party SIP trunk.	1,000 minutes
WebRTC participant minutes	Time that a single user is connected to LiveKit Cloud via a LiveKit SDK.	5,000
Downstream data transfer GB	The total data transferred out of LiveKit Cloud during a session, including media tracks and data packets.	50 GB
Transcode minutes	Time spent transcoding an incoming stream with the Ingress service or a composite stream with the Egress service.	60 minutes
Track egress minutes	Time spent transcoding a single track with the Egress service.	60 minutes

Inference credits

The monthly included allowance for LiveKit Inference is expressed in credits, measured in USD. These credits can be used for any combination of supported models. Unused credits do not roll over to the next month.

Enterprise plans

Enterprise plans can be configured with custom limits well above the published Build, Ship, and Scale numbers. They come with an annual commitment so that LiveKit can provision the necessary capacity in advance. Contact the sales team with your project details.