Overview
The custom voices feature lets you create a voice clone from a short audio clip. Upload or record a sample, and LiveKit clones it to all supported TTS providers on your plan. You can then use the clone in your agent sessions with any of those providers.
Custom voices are available on paid LiveKit Cloud plans. You create and manage voice clones in the LiveKit Cloud dashboard. Once created, use them in your agent code with LiveKit Inference. No separate provider API keys are required.
Supported providers
Voice clones are automatically created on the following TTS providers:
| Provider | Noise removal | Plan required |
|---|---|---|
| Cartesia | Yes (audio enhancement) | Ship or higher |
| Inworld | Yes | Ship or higher |
When you create a voice clone, LiveKit clones it to all providers your plan supports. You can then use the voice with any TTS model from those providers. For complete limits by plan, see Quotas and limits.
Supported languages
The following languages are supported for voice clone input audio. Select the language that matches the speech in your audio sample:
| Code | Language | Code | Language |
|---|---|---|---|
en | English | hi | Hindi |
fr | French | it | Italian |
de | German | ko | Korean |
es | Spanish | nl | Dutch |
pt | Portuguese | pl | Polish |
zh | Chinese | ru | Russian |
ja | Japanese | ar | Arabic |
Create a voice clone
Create voice clones from the LiveKit Cloud dashboard:
- Open your project in the dashboard.
- Navigate to Voices > Custom voices in the sidebar.
- Click Create voice clone.
- Choose Upload file or Record audio:
- Upload: Drag and drop or browse for an audio file. Supported formats: MP3, WAV, OGG, or WEBM. Maximum file size: 4 MB.
- Record: Click Start recording and speak clearly for about 10 seconds. A sample script is provided in the dialog.
- Optionally trim the audio using the waveform trimmer.
- Enter a voice name and select the language spoken in the audio.
- Optionally enable Remove background noise if your audio has ambient noise. This may slightly affect voice quality.
- Click Upload and clone voice.
- Review the consent items, check I provide consent to the above items, and click Continue.
The voice is cloned to all supported providers in parallel. Processing typically takes under a minute. Once ready, the clone appears in your voices list with its status and a unique voice ID (for example, v_RT5PsNhXvMaB).
For tips on getting the best results, see Audio requirements.
Use a voice clone
Once a voice clone is ready, use its voice ID (the v_* identifier) in your agent session, just like any other voice. LiveKit Inference automatically routes the request to the correct provider.
In Agent Builder, open the Models & Voice tab, set voice type to Custom, and pick your cloned voice and TTS model.
from livekit.agents import AgentSession, inferencesession = AgentSession(tts=inference.TTS(model="cartesia/sonic-3",voice="v_RT5PsNhXvMaB",),# ... llm, stt, etc.)
import { AgentSession, inference } from '@livekit/agents';const session = new AgentSession({tts: new inference.TTS({model: "cartesia/sonic-3",voice: "v_RT5PsNhXvMaB",}),// ... llm, stt, etc.});
You can also use the string descriptor shortcut:
from livekit.agents import AgentSessionsession = AgentSession(tts="cartesia/sonic-3:v_RT5PsNhXvMaB",# ... llm, stt, etc.)
import { AgentSession } from '@livekit/agents';const session = new AgentSession({tts: "cartesia/sonic-3:v_RT5PsNhXvMaB",// ... llm, stt, etc.});
Using different TTS models
A voice clone works with any TTS model from a provider it was cloned to. For example, if your voice was cloned to both Cartesia and Inworld, you can switch between models:
from livekit.agents import inference# Use with Cartesiatts_cartesia = inference.TTS(model="cartesia/sonic-3",voice="v_RT5PsNhXvMaB",)# Use with Inworldtts_inworld = inference.TTS(model="inworld/inworld-tts-1.5-max",voice="v_RT5PsNhXvMaB",)
import { inference } from '@livekit/agents';// Use with Cartesiaconst ttsCartesia = new inference.TTS({model: "cartesia/sonic-3",voice: "v_RT5PsNhXvMaB",});// Use with Inworldconst ttsInworld = new inference.TTS({model: "inworld/inworld-tts-1.5-max",voice: "v_RT5PsNhXvMaB",});
LiveKit Inference automatically resolves the voice ID to the correct provider-specific voice for the model you selected.
Automatic fallback
Because each voice is cloned to every provider your plan supports, LiveKit Inference automatically falls back to another provider if the primary one is unavailable. No configuration is required. If the provider for the model you selected fails, the session continues on another provider where the clone is ready. Each provider was trained on the same audio sample, so the voice stays recognizable across providers, though each provider's model has its own characteristics so the output isn't identical.
Manage voice clones
You can preview, re-clone, and delete voice clones from the voice detail page in the dashboard. Voice clones are scoped to a single project and can't be shared across projects.
Preview a clone
On the voice detail page, select a TTS model and enter custom text to hear how the clone sounds. Each provider's model produces a slightly different rendition of the same voice, so use the preview to pick the one you prefer.
Voice status
Each voice clone has an overall status and per-provider status:
| Status | Description |
|---|---|
| Active | Voice is ready and available for use. |
| Processing | Voice is being cloned. This typically takes under a minute. |
| Partial | Voice is ready on some providers but failed on others. The voice is still usable with the providers where it succeeded. |
| Failed | Cloning failed on all providers. |
Re-clone a voice
If a voice failed to clone on a specific provider, or if new providers become available, you can re-clone the voice. On the voice detail page in the dashboard, open the provider menu and select Re-clone voice with provider.
Delete a clone
To delete a voice clone, open the voice detail page in the dashboard and click Delete voice clone. This permanently removes the voice from all TTS providers and cannot be undone.
Audio requirements
For the best results when creating a voice clone:
- Duration: About 10 seconds of speech. The audio trimmer in the dashboard lets you adjust the selection.
- Quality: Use a clear recording with minimal background noise. A quiet room or headset microphone works well.
- Content: Speak naturally with your normal pace and intonation. Avoid whispering or exaggerated expression.
- Format: MP3, WAV, OGG, or WEBM. Maximum file size: 4 MB.
The Remove background noise option can help with noisy recordings, but may slightly alter the voice characteristics. For the best results, start with a clean recording.
Audio retention
LiveKit stores your audio sample so the voice can be re-cloned to new providers and models as they're added. Recordings are deleted 12 months after the voice clone was last used, or immediately if you delete the clone.
Billing and limits
Creating a voice clone is free. Synthesis is billed at the standard LiveKit Inference TTS rate, the same as any other voice. The number of voice clones you can create and the available providers depend on your plan. For details on limits by plan, see Quotas and limits.