LiveKit docs › Models › TTS › Cartesia

---

# Cartesia TTS

> How to use Cartesia TTS with LiveKit Agents.

- **[Use in Agent Builder](https://cloud.livekit.io/projects/p_/agents/builder/new?tts=cartesia%2Fsonic-3)**: Create a new agent in your browser using cartesia/sonic-3

## Overview

Cartesia text-to-speech is available in LiveKit Agents through [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md) and the [Cartesia plugin](#plugin). With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you want to manage your own billing and rate limits. Pricing for LiveKit Inference is available on the [pricing page](https://livekit.com/pricing/inference#tts).

## LiveKit Inference

Use [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md) to access Cartesia TTS without a separate Cartesia API key.

| Model ID | Languages |
| -------- | --------- |
| `cartesia/sonic` | `en`, `fr`, `de`, `es`, `pt`, `zh`, `ja`, `hi`, `it`, `ko`, `nl`, `pl`, `ru`, `sv`, `tr` |
| `cartesia/sonic-2` | `en`, `fr`, `de`, `es`, `pt`, `zh`, `ja`, `ko` |
| `cartesia/sonic-3` | `en`, `de`, `es`, `fr`, `ja`, `pt`, `zh`, `hi`, `ko`, `it`, `nl`, `pl`, `ru`, `sv`, `tr`, `tl`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `hu`, `no`, `vi`, `bn`, `th`, `he`, `ka`, `id`, `te`, `gu`, `kn`, `ml`, `mr`, `pa` |
| `cartesia/sonic-3-latest` | `en`, `de`, `es`, `fr`, `ja`, `pt`, `zh`, `hi`, `ko`, `it`, `nl`, `pl`, `ru`, `sv`, `tr`, `tl`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `hu`, `no`, `vi`, `bn`, `th`, `he`, `ka`, `id`, `te`, `gu`, `kn`, `ml`, `mr`, `pa` |
| `cartesia/sonic-latest` | `en`, `de`, `es`, `ja`, `pt`, `zh`, `hi`, `ko`, `nl`, `pl`, `ru`, `sv`, `tr`, `tl`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `hu`, `no`, `vi`, `bn`, `th`, `he`, `ka`, `id`, `te`, `gu`, `kn`, `ml`, `mr`, `pa` |
| `cartesia/sonic-3.5` | `en`, `de`, `es`, `ja`, `pt`, `zh`, `hi`, `ko`, `nl`, `pl`, `ru`, `sv`, `tr`, `tl`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `hu`, `no`, `vi`, `bn`, `th`, `he`, `ka`, `id`, `te`, `gu`, `kn`, `ml`, `mr`, `pa` |
| `cartesia/sonic-3.5-2026-05-04` | `en`, `de`, `es`, `ja`, `pt`, `zh`, `hi`, `ko`, `nl`, `pl`, `ru`, `sv`, `tr`, `tl`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `hu`, `no`, `vi`, `bn`, `th`, `he`, `ka`, `id`, `te`, `gu`, `kn`, `ml`, `mr`, `pa` |
| `cartesia/sonic-3-2025-10-27` | `en`, `de`, `es`, `fr`, `ja`, `pt`, `zh`, `hi`, `ko`, `it`, `nl`, `pl`, `ru`, `sv`, `tr`, `tl`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `hu`, `no`, `vi`, `bn`, `th`, `he`, `ka`, `id`, `te`, `gu`, `kn`, `ml`, `mr`, `pa` |
| `cartesia/sonic-3-2026-01-12` | `en`, `de`, `es`, `fr`, `ja`, `pt`, `zh`, `hi`, `ko`, `it`, `nl`, `pl`, `ru`, `sv`, `tr`, `tl`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `hu`, `no`, `vi`, `bn`, `th`, `he`, `ka`, `id`, `te`, `gu`, `kn`, `ml`, `mr`, `pa` |
| `cartesia/sonic-turbo` | `en`, `fr`, `de`, `es`, `pt`, `zh`, `ja`, `hi`, `ko` |

### Usage

To use Cartesia, use the `TTS` class from the `inference` module:

**Python**:

```python
from livekit.agents import AgentSession, inference

session = AgentSession(
    tts=inference.TTS(
        model="cartesia/sonic-3", 
        voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", 
        language="en",
        extra_kwargs={
            "speed": 1.5,
            "volume": 1.2,
            "emotion": "excited"
        }
    ),
    # ... tts, stt, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession, inference } from '@livekit/agents';

session = new AgentSession({
    tts: new inference.TTS({
        model: "cartesia/sonic-3", 
        voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", 
        language: "en",
        modelOptions: {
            speed: 1.5,
            volume: 1.2,
            emotion: "excited"
        }
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

```

### Parameters

- **`model`** _(string)_: The model ID from the [models list](#inference).

- **`voice`** _(string)_: See [voices](#voices) for guidance on selecting a voice.

- **`language`** _(LanguageCode)_ (optional): [Language code](https://docs.livekit.io/agents/models/tts.md#language-codes) for the input text. If not set, the model default applies.

- **`extra_kwargs`** _(dict)_ (optional): Additional parameters to pass to the Cartesia TTS API. See [model parameters](#model-parameters) for supported fields.

In Node.js this parameter is called `modelOptions`.

#### Model parameters

Pass the following parameters inside `extra_kwargs` (Python) or `modelOptions` (Node.js):

| Parameter | Type | Default | Notes |
| emotion | `str` |  | Emotion control string. See [Emotion Controls](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#emotion-controls-beta) for supported values. |
| speed | `"slow" | "normal" | "fast" | float` |  | Speed of speech. Either a preset string or a numeric multiplier. See [Speed and Volume Controls](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#speed-and-volume-controls) for more information. |
| volume | `float` |  | Volume of speech. See [Speed and Volume Controls](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#speed-and-volume-controls) for more information. |
| duration | `float` |  | Target duration in seconds for the generated audio. |
| max_buffer_delay_ms | `int` |  | Maximum buffer delay in milliseconds before flushing a chunk. |
| add_timestamps | `bool` |  | Whether to include word-level timestamps in the response. |
| add_phoneme_timestamps | `bool` |  | Whether to include phoneme-level timestamps in the response. |
| use_normalized_timestamps | `bool` |  | Whether to return timestamps in normalized form. |

### Voices

LiveKit Inference supports all of the default "Cartesia Voices" available in the Cartesia API. You can explore the available voices in the [Cartesia voice library](https://play.cartesia.ai/voices) (free account required), and use the voice by copying its ID into your LiveKit agent session.

> ℹ️ **Custom & community voices**
> 
> Pre-existing custom Cartesia voices are not available through LiveKit Inference. To use these, create your own Cartesia account and use the [Cartesia plugin](https://docs.livekit.io/agents/models/tts/cartesia.md#plugin). To create a new custom voice through LiveKit, see [Custom voices](https://docs.livekit.io/agents/models/tts/custom-voices.md).

The following is a small sample of the Cartesia voices available in LiveKit Inference.

| Provider | Name | Description | Language | ID |
| -------- | ---- | ----------- | -------- | -------- |
| Cartesia | Blake | Energetic American adult male | `en-US` | `cartesia/sonic-3:a167e0f3-df7e-4d52-a9c3-f949145efdab` |
| Cartesia | Daniela | Calm and trusting Mexican female | `es-MX` | `cartesia/sonic-3:5c5ad5e7-1020-476b-8b91-fdcbe9cc313c` |
| Cartesia | Jacqueline | Confident, young American adult female | `en-US` | `cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc` |
| Cartesia | Robyn | Neutral, mature Australian female | `en-AU` | `cartesia/sonic-3:f31cc6a7-c1e8-4764-980c-60a361443dd1` |

### String descriptors

As a shortcut, you can also pass a descriptor with the [model ID](#inference) and voice directly to the `tts` argument in your `AgentSession`:

**Python**:

```python
from livekit.agents import AgentSession

session = AgentSession(
    tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
    # ... llm, stt, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
    // ... llm, stt, vad, turnHandling, etc.
});

```

## Plugin

LiveKit's plugin support for Cartesia lets you connect directly to Cartesia's TTS API with your own API key.

Available in:
- [x] Node.js
- [x] Python

### Installation

Install the plugin from PyPI:

**Python**:

```shell
uv add "livekit-agents[cartesia]~=1.5"

```

---

**Node.js**:

```shell
pnpm add @livekit/agents-plugin-cartesia@1.x

```

### Authentication

The Cartesia plugin requires a [Cartesia API key](https://play.cartesia.ai/keys).

Set `CARTESIA_API_KEY` in your `.env` file.

### Usage

Use Cartesia TTS within an `AgentSession` or as a standalone speech generator. For example, you can use this TTS in the [Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md).

**Python**:

```python
from livekit.plugins import cartesia

session = AgentSession(
   tts=cartesia.TTS(
      model="sonic-3",
      voice="f786b574-daa5-4673-aa0c-cbe3e8534c02",
   )
   # ... llm, stt, etc.
)

```

---

**Node.js**:

```typescript
import * as cartesia from '@livekit/agents-plugin-cartesia';

const session = new voice.AgentSession({
    tts: cartesia.TTS(
        model: "sonic-3",
        voice: "f786b574-daa5-4673-aa0c-cbe3e8534c02",
    ),
    // ... llm, stt, etc.
});

```

### Parameters

This section describes some of the available parameters. See the plugin reference links in the [Additional resources](#additional-resources) section for a complete list of all available parameters.

- **`model`** _(string)_ (optional) - Default: `sonic-3`: ID of the model to use for generation. See [supported models](https://docs.cartesia.ai/build-with-cartesia/models/tts).

- **`voice`** _(string | list[float])_ (optional) - Default: `f786b574-daa5-4673-aa0c-cbe3e8534c02`: ID of the voice to use for generation, or an embedding array. See [official documentation](https://docs.cartesia.ai/api-reference/tts/tts#send.Generation%20Request.voice).

- **`language`** _(LanguageCode)_ (optional) - Default: `en`: [Language code](https://docs.livekit.io/agents/models/tts.md#language-codes) for the input text. For a list of languages supported by model, see [supported models](https://docs.cartesia.ai/build-with-cartesia/models/tts).

- **`emotion`** _(string)_ (optional): See [Emotion Controls](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#emotion-controls-beta) for Sonic 3 for supported values.

- **`speed`** _(float)_ (optional) - Default: `1`: Speed of the speech, where 1.0 is the default speed. See [Speed and Volume Controls](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#speed-and-volume-controls) for Sonic 3 for more information.

- **`volume`** _(float)_ (optional) - Default: `1`: Volume of the speech, where 1.0 is the default volume. See [Speed and Volume Controls](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#speed-and-volume-controls) for Sonic 3 for more information.

- **`pronunciation_dict_id`** _(string)_ (optional): ID of a Cartesia pronunciation dictionary to apply when generating speech. Only supported for `sonic-3` models. To learn more, see [Custom pronunciations](https://docs.cartesia.ai/build-with-cartesia/sonic-3/custom-pronunciations) in the Cartesia docs.

In Node.js this parameter is called `pronunciationDictId`.

## Customizing pronunciation

Cartesia supports two approaches for customizing how `sonic-3` pronounces specific words. For full syntax details, see [Custom pronunciations](https://docs.cartesia.ai/build-with-cartesia/sonic-3/custom-pronunciations) in the Cartesia docs.

### Inline phoneme overrides

Insert `<<ˈ|IPA|symbols>>` syntax directly into the text the model speaks. Each `<<...>>` block is a pronunciation override expressed as `|`-separated International Phonetic Alphabet (IPA) symbols, with `ˈ` marking primary stress. Use this for one-off corrections of brand names or technical terms. For a Python example using `text_transforms.replace()`, see the [text transforms guide](https://docs.livekit.io/agents/multimodality/text.md#text-transforms).

### Pronunciation dictionaries

Create a reusable pronunciation dictionary in Cartesia's dashboard and pass its ID to the `pronunciation_dict_id` plugin parameter. Use this when you have many terms or want to share pronunciations across agents.

For prompting techniques that produce more expressive Cartesia output, see [Voice realism](https://docs.livekit.io/agents/start/prompting.md#voice-realism) in the prompting guide.

## Transcription timing

Cartesia TTS supports aligned transcription forwarding, which improves transcription synchronization in your frontend. Set `use_tts_aligned_transcript=True` in your `AgentSession` configuration to enable this feature. To learn more, see [the docs](https://docs.livekit.io/agents/build/text.md#tts-aligned-transcriptions).

## Additional resources

The following resources provide more information about using Cartesia with LiveKit Agents.

- **[Cartesia docs](https://docs.cartesia.ai/build-with-cartesia/models/tts)**: Cartesia TTS docs.

- **[Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md)**: Get started with LiveKit Agents and Cartesia TTS.

- **[Cartesia STT](https://docs.livekit.io/agents/models/stt/cartesia.md)**: Guide to the Cartesia STT plugin with LiveKit Agents.

---

This document was rendered at 2026-06-07T11:36:30.658Z.
For the latest version of this document, see [https://docs.livekit.io/agents/models/tts/cartesia.md](https://docs.livekit.io/agents/models/tts/cartesia.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).