Overview
LiveKit Cloud applies quotas to each project — limits on concurrency, request rates, session durations, and other per-project operations. When a quota is reached, new operations of that kind fail until conditions allow them again (closing an existing session, waiting for the rate limit window to reset, or deleting an existing item). Most features are also metered: you're billed per unit of consumption, with an included monthly allowance on each plan. Higher-tier plans receive higher limits and larger allowances.
Limits serve different purposes across the platform. Concurrency and rate limits keep the platform stable under load. Time and size limits cap individual sessions, recordings, and uploads so a single workload can't consume the platform indefinitely. Per-plan limits on features like custom voices and observability retention reflect what's included with each plan tier.
For projects on the free Build plan, the included allowance acts as a hard cap — after you exceed it, new requests fail rather than incurring overage charges. Free projects also share allowances and limits across all of a user's free projects; creating additional projects doesn't increase the total. Included allowances reset on the first day of each calendar month and don't roll over.
You can view the current limits on your project at any time in the LiveKit Cloud dashboard by navigating to Settings and selecting the Project tab. Refer to the latest pricing page for current numbers on each plan. Enterprise customers can negotiate an Enterprise plan with significantly higher limits in exchange for an annual commitment.
WebRTC transport
LiveKit Cloud transports realtime media between participants using WebRTC. The Ingress and Egress services let you push external streams into a room or record and forward streams out. The following limits apply to these services.
Concurrency limits
The following table shows the default concurrency limits on the Build plan.
| Type | Definition | Free limit |
|---|---|---|
| Participant | Total number of connected agents and end-users across all rooms. | 100 participants |
| Ingress request | An active session of the Ingress service transcoding an incoming stream. | 2 requests |
| Egress request | An active session of the Egress service recording a composite stream or single track. | 2 requests |
Media subscription limits
Each active participant can only subscribe to a limited number of individual media tracks at once. The following table shows the default limits for all plan types.
| Track type | Limit |
|---|---|
| Video | 100 |
| Audio | 100 |
For high volume video use cases, consider using pagination and selective subscriptions to keep the number of subscriptions within these limits.
Egress time limits
The LiveKit Cloud Egress service has time limits, which vary based on the output type. The following table shows the default limits for all plan types.
| Egress output | Time limit |
|---|---|
| File output (MP4, OGG, WebM) | 3 hours |
| HLS segments | 12 hours |
| HLS/RTMP streaming | 12 hours |
| Raw single stream (track) | 12 hours |
When these time limits are reached, any active egress ends with a LIMIT_REACHED status. The recorded data, however, is still sent to your configured output destinations.
You can listen for this status change using the egress_ended webhook.
LiveKit Inference
LiveKit Inference serves STT, TTS, and LLM models. STT and TTS run over persistent WebSocket connections, while LLM is exposed as a stateless HTTP API. Each model type has its own kind of limit.
STT and TTS concurrency limits
STT and TTS connections to LiveKit Inference each have their own concurrency limit. The following table shows the defaults on the Build plan.
| Type | Definition | Free limit |
|---|---|---|
| LiveKit Inference STT | Active STT connections to LiveKit Inference models. | 5 connections |
| LiveKit Inference TTS | Active TTS connections to LiveKit Inference models. | 5 connections |
LLM rate limits
Because applications vary in their request rate and token usage, LLM usage has two rate limits instead of a single concurrency cap: requests per minute (RPM) and tokens per minute (TPM). Both are enforced in a sliding 60-second window — if either is reached, new requests fail. The goal is to support the same effective concurrency as STT and TTS.
The following table shows the default rate limits on the Build plan. For rate limits on paid plans, refer to the latest pricing .
| Limit type | Definition | Free limit |
|---|---|---|
| LLM requests | Individual requests to a LiveKit Inference LLM model, including tool responses and preemptive generations. | 100 requests per minute |
| LLM tokens | Input and output tokens used in requests to a LiveKit Inference LLM model, including tool responses and preemptive generations. | 600,000 tokens per minute |
Custom voice limits
Custom voices availability and limits vary by plan. The following table shows which operations are available on each plan and the maximum number of voice clones per project.
| Operation | Build (free) | Ship | Scale | Enterprise |
|---|---|---|---|---|
| View voices | Yes | Yes | Yes | Yes |
| Delete voice | Yes | Yes | Yes | Yes |
| Create clone | No | Yes (limit 20) | Yes (limit 50) | Yes (limit 50) |
| Re-clone to provider | No | Yes | Yes | Yes |
| Use voice clone (TTS) | No | Yes | Yes | Yes |
Voice clone limits are per project. Each clone counts toward the limit regardless of how many providers it is cloned to. When the limit is reached, you must delete an existing clone before creating a new one.
View and delete operations are available on the Build plan so that users who downgrade from a paid plan can still manage their existing voices.
Usage of voice clones for TTS synthesis is billed at standard LiveKit Inference TTS rates , the same as any other voice.
Agent deployment
Agents deployed to LiveKit Cloud are subject to concurrency limits, a build-size limit on each deployment, a free-tier allowance for the adaptive interruption handling model, and cold-start delays on the Build plan.
Agent session concurrency
An agent session is an actively connected agent running on LiveKit Cloud. Build plan projects can run up to 5 agent sessions concurrently.
Build context size
The build context uploaded during lk agent deploy has a maximum size of 1 GB. Use .dockerignore or .gitignore to exclude unnecessary files. See Builds and Dockerfiles for more information.
Adaptive interruption handling
Usage of the adaptive interruption handling model is free for all agents deployed to LiveKit Cloud. For local development and testing, every plan includes 40,000 free requests per month. Each 100 ms of overlapping speech audio is counted as one request.
Agent cold starts
Projects on the Build plan might have their deployed agents shut down after all active sessions end. The agent automatically starts again when a new session begins. This can cause up to 10 to 20 seconds of delay before the agent joins the room.
Agent observability
Agents continuously stream observability events while connected to a session. Audio recordings are collected locally, and uploaded after the session ends.
Event and audio rate limits
The following table shows the limits placed on the volume of observability events and recordings produced across all sessions, per minute.
| Limit type | Definition | Free limit |
|---|---|---|
| Agent observability events | Individual transcripts, observations, and logs streamed to LiveKit Cloud. | 1,000 events per minute |
| Agent audio recordings | Audio session recordings collected locally and uploaded to LiveKit Cloud. | 5 minutes of audio per minute |
Retention window
In addition to the rate limits above, all agent observability data is subject to a 30-day retention window. See the agent observability guide for more information.
API rate limits
All projects have a Server API rate limit of 1,000 requests per minute. This applies to requests such as to the RoomService or EgressService, not to SDK methods like joining a room or sending data packets. Requests to LiveKit Inference have their own rate limits.
Requesting limit increases
Customers on the Scale plan can request an increase for specific limits in their project settings .
Metered resources
Most features of LiveKit Cloud are metered — you're billed by the unit of resource you consume. Every plan ships with an included monthly allowance for each metered resource. On paid plans, usage beyond the included allowance is billed incrementally at the plan's published rate. On the free Build plan, the included allowance is a hard cap and new requests fail after it's exceeded.
The following table defines each metered resource and shows the included allowance on the free Build plan.
| Resource | Definition | Free allowance |
|---|---|---|
| Agent session minutes | Active time that an agent deployed to LiveKit Cloud is connected to a WebRTC or Telephony session. | 1,000 minutes |
| Agent observability events | Individual transcripts, observations, and logs in agent observability. | 100,000 events |
| Agent audio recordings | Audio session recordings for agent observability. | 1,000 minutes |
| LiveKit Inference | Aggregated usage for all LiveKit Inference models, at current pricing . | $2.50 |
| US local number rental | Monthly rental for a LiveKit Phone Number. | 1 number |
| US local inbound minutes | Inbound minutes to a US local number. | 50 minutes |
| US toll-free number rental | Monthly rental for a toll-free LiveKit Phone Number. | 0 numbers |
| US toll-free inbound minutes | Inbound minutes to a US toll-free number. | 0 minutes |
| Third-party SIP minutes | Time that a single caller is connected to LiveKit Cloud via a third-party SIP trunk. | 1,000 minutes |
| WebRTC participant minutes | Time that a single user is connected to LiveKit Cloud via a LiveKit SDK. | 5,000 |
| Downstream data transfer GB | The total data transferred out of LiveKit Cloud during a session, including media tracks and data packets. | 50 GB |
| Transcode minutes | Time spent transcoding an incoming stream with the Ingress service or a composite stream with the Egress service. | 60 minutes |
| Track egress minutes | Time spent transcoding a single track with the Egress service. | 60 minutes |
The monthly included allowance for LiveKit Inference is expressed in credits, measured in USD. These credits can be used for any combination of supported models. Unused credits do not roll over to the next month.
Enterprise plans
Enterprise plans can be configured with custom limits well above the published Build, Ship, and Scale numbers. They come with an annual commitment so that LiveKit can provision the necessary capacity in advance. Contact the sales team with your project details.