LiveKit docs › Models › TTS › Custom voices

---

# Custom voices

> Create voice clones from audio samples and use them with supported TTS providers in LiveKit Inference.

## Overview

The custom voices feature lets you create a voice clone from a short audio clip. Upload or record a sample, and LiveKit clones it to all [supported TTS providers](#providers) on your plan. You can then use the clone in your agent sessions with any of those providers.

Custom voices are available on paid LiveKit Cloud plans. You create and manage voice clones in the [LiveKit Cloud dashboard](https://cloud.livekit.io). Once created, use them in your agent code with [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md). No separate provider API keys are required.

### Supported providers

Voice clones are automatically created on the following TTS providers:

| Provider | Noise removal | Plan required |
| [Cartesia](https://docs.livekit.io/agents/models/tts/cartesia.md) | Yes (audio enhancement) | Ship or higher |
| [Inworld](https://docs.livekit.io/agents/models/tts/inworld.md) | Yes | Ship or higher |

When you create a voice clone, LiveKit clones it to all providers your plan supports. You can then use the voice with any TTS model from those providers. For complete limits by plan, see [Quotas and limits](https://docs.livekit.io/deploy/admin/quotas-and-limits.md#custom-voice-limits).

### Supported languages

The following languages are supported for voice clone input audio. Select the language that matches the speech in your audio sample:

| Code | Language | Code | Language |
| `en` | English | `hi` | Hindi |
| `fr` | French | `it` | Italian |
| `de` | German | `ko` | Korean |
| `es` | Spanish | `nl` | Dutch |
| `pt` | Portuguese | `pl` | Polish |
| `zh` | Chinese | `ru` | Russian |
| `ja` | Japanese | `ar` | Arabic |

## Create a voice clone

Create voice clones from the [LiveKit Cloud dashboard](https://cloud.livekit.io):

1. Open your project in the dashboard.
2. Navigate to **Voices** > **Custom voices** in the sidebar.
3. Click **Create voice clone**.
4. Choose **Upload file** or **Record audio**:- **Upload**: Drag and drop or browse for an audio file. Supported formats: MP3, WAV, OGG, or WEBM. Maximum file size: 4 MB.
- **Record**: Click **Start recording** and speak clearly for about 10 seconds. A sample script is provided in the dialog.
5. Optionally trim the audio using the waveform trimmer.
6. Enter a **voice name** and select the **language** spoken in the audio.
7. Optionally enable **Remove background noise** if your audio has ambient noise. This may slightly affect voice quality.
8. Click **Upload and clone voice**.
9. Review the consent items, check **I provide consent to the above items**, and click **Continue**.

The voice is cloned to all supported providers in parallel. Processing typically takes under a minute. Once ready, the clone appears in your voices list with its status and a unique voice ID (for example, `v_RT5PsNhXvMaB`).

> 💡 **Tip**
> 
> For tips on getting the best results, see [Audio requirements](#audio).

## Use a voice clone

Once a voice clone is ready, use its voice ID (the `v_*` identifier) in your agent session, just like any other voice. LiveKit Inference automatically routes the request to the correct provider.

In [Agent Builder](https://livekit.com/products/agent-builder), open the **Models & Voice** tab, set voice type to **Custom**, and pick your cloned voice and TTS model.

**Python**:

```python
from livekit.agents import AgentSession, inference

session = AgentSession(
    tts=inference.TTS(
        model="cartesia/sonic-3",
        voice="v_RT5PsNhXvMaB",
    ),
    # ... llm, stt, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession, inference } from '@livekit/agents';

const session = new AgentSession({
    tts: new inference.TTS({
        model: "cartesia/sonic-3",
        voice: "v_RT5PsNhXvMaB",
    }),
    // ... llm, stt, etc.
});

```

You can also use the string descriptor shortcut:

**Python**:

```python
from livekit.agents import AgentSession

session = AgentSession(
    tts="cartesia/sonic-3:v_RT5PsNhXvMaB",
    # ... llm, stt, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession } from '@livekit/agents';

const session = new AgentSession({
    tts: "cartesia/sonic-3:v_RT5PsNhXvMaB",
    // ... llm, stt, etc.
});

```

### Using different TTS models

A voice clone works with any TTS model from a provider it was cloned to. For example, if your voice was cloned to both Cartesia and Inworld, you can switch between models:

**Python**:

```python
from livekit.agents import inference

# Use with Cartesia
tts_cartesia = inference.TTS(
    model="cartesia/sonic-3",
    voice="v_RT5PsNhXvMaB",
)

# Use with Inworld
tts_inworld = inference.TTS(
    model="inworld/inworld-tts-1.5-max",
    voice="v_RT5PsNhXvMaB",
)

```

---

**Node.js**:

```typescript
import { inference } from '@livekit/agents';

// Use with Cartesia
const ttsCartesia = new inference.TTS({
    model: "cartesia/sonic-3",
    voice: "v_RT5PsNhXvMaB",
});

// Use with Inworld
const ttsInworld = new inference.TTS({
    model: "inworld/inworld-tts-1.5-max",
    voice: "v_RT5PsNhXvMaB",
});

```

LiveKit Inference automatically resolves the voice ID to the correct provider-specific voice for the model you selected.

### Automatic fallback

Because each voice is cloned to every provider your plan supports, LiveKit Inference automatically falls back to another provider if the primary one is unavailable. No configuration is required. If the provider for the model you selected fails, the session continues on another provider where the clone is ready. Each provider was trained on the same audio sample, so the voice stays recognizable across providers, though each provider's model has its own characteristics so the output isn't identical.

## Manage voice clones

You can preview, re-clone, and delete voice clones from the voice detail page in the dashboard. Voice clones are scoped to a single project and can't be shared across projects.

### Preview a clone

On the voice detail page, select a TTS model and enter custom text to hear how the clone sounds. Each provider's model produces a slightly different rendition of the same voice, so use the preview to pick the one you prefer.

### Voice status

Each voice clone has an overall status and per-provider status:

| Status | Description |
| **Active** | Voice is ready and available for use. |
| **Processing** | Voice is being cloned. This typically takes under a minute. |
| **Partial** | Voice is ready on some providers but failed on others. The voice is still usable with the providers where it succeeded. |
| **Failed** | Cloning failed on all providers. |

### Re-clone a voice

If a voice failed to clone on a specific provider, or if new providers become available, you can re-clone the voice. On the voice detail page in the dashboard, open the provider menu and select **Re-clone voice with provider**.

### Delete a clone

To delete a voice clone, open the voice detail page in the dashboard and click **Delete voice clone**. This permanently removes the voice from all TTS providers and cannot be undone.

## Audio requirements

For the best results when creating a voice clone:

- **Duration**: About 10 seconds of speech. The audio trimmer in the dashboard lets you adjust the selection.
- **Quality**: Use a clear recording with minimal background noise. A quiet room or headset microphone works well.
- **Content**: Speak naturally with your normal pace and intonation. Avoid whispering or exaggerated expression.
- **Format**: MP3, WAV, OGG, or WEBM. Maximum file size: 4 MB.

> ℹ️ **Note**
> 
> The **Remove background noise** option can help with noisy recordings, but may slightly alter the voice characteristics. For the best results, start with a clean recording.

### Audio retention

LiveKit stores your audio sample so the voice can be re-cloned to new providers and models as they're added. Recordings are deleted 12 months after the voice clone was last used, or immediately if you delete the clone.

## Billing and limits

Creating a voice clone is free. Synthesis is billed at the standard [LiveKit Inference TTS rate](https://livekit.com/pricing/inference#tts), the same as any other voice. The number of voice clones you can create and the available providers depend on your plan. For details on limits by plan, see [Quotas and limits](https://docs.livekit.io/deploy/admin/quotas-and-limits.md#custom-voice-limits).

## Additional resources

- **[TTS models overview](https://docs.livekit.io/agents/models/tts.md)**: Browse all available TTS models and voices.

- **[Audio customization](https://docs.livekit.io/agents/multimodality/audio/customization.md)**: Customize pronunciation, caching, and speech volume.

- **[Quotas and limits](https://docs.livekit.io/deploy/admin/quotas-and-limits.md#custom-voice-limits)**: Voice clone quotas and provider availability by plan.

---

This document was rendered at 2026-06-07T11:36:31.645Z.
For the latest version of this document, see [https://docs.livekit.io/agents/models/tts/custom-voices.md](https://docs.livekit.io/agents/models/tts/custom-voices.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).