Cartesia TTS | LiveKit Documentation

Create a new agent in your browser using this model

Overview

Cartesia text-to-speech is available in LiveKit Agents through LiveKit Inference and the Cartesia plugin. Pricing for LiveKit Inference is available on the pricing page.

Model ID	Languages
cartesia/sonic-3	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
cartesia/sonic-2	enfrdeesptzhjahiitkonlplrusvtr
cartesia/sonic-turbo	enfrdeesptzhjahiitkonlplrusvtr
cartesia/sonic	enfrdeesptzhjahiitkonlplrusvtr

LiveKit Inference

Use LiveKit Inference to access Cartesia TTS without a separate Cartesia API key.

Usage

To use Cartesia, pass a descriptor with the model and voice to the tts argument in your AgentSession:

from livekit.agents import AgentSession

session = AgentSession(
    tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
    # ... llm, stt, vad, turn_detection, etc.
)

import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
    // ... llm, stt, vad, turn_detection, etc.
});

Parameters

To customize additional parameters, use the TTS class from the inference module:

from livekit.agents import AgentSession, inference

session = AgentSession(
    tts=inference.TTS(
        model="cartesia/sonic-3", 
        voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", 
        language="en",
        extra_kwargs={
            "speed": 1.5,
            "volume": 1.2,
            "emotion": "excited"
        }
    ),
    # ... tts, stt, vad, turn_detection, etc.
)

import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    tts: new inference.TTS({ 
        model: "cartesia/sonic-3", 
        voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", 
        language: "en",
        modelOptions: {
            speed: 1.5,
            volume: 1.2,
            emotion: "excited"
        }
    }),
    // ... tts, stt, vad, turn_detection, etc.
});

modelstringRequired

The model ID from the models list.

voicestringRequired

See voices for guidance on selecting a voice.

languagestringOptional

Language code for the input text. If not set, the model default applies.

extra_kwargsdictOptional

Additional parameters to pass to the Cartesia TTS API, including emotion, speed, and volume. See the provider's documentation for more information.

In Node.js this parameter is called modelOptions.

Voices

LiveKit Inference supports all of the default "Cartesia Voices" available in the Cartesia API. You can explore the available voices in the Cartesia voice library (free account required), and use the voice by copying its ID into your LiveKit agent session.

Custom voices unavailable

Custom Cartesia voices, including voice cloning, are not yet supported in LiveKit Inference. To use custom voices, create your own Cartesia account and use the Cartesia plugin for LiveKit Agents instead.

The following is a small sample of the Cartesia voices available in LiveKit Inference.

Use the keyboard and arrows to audition voices

Customizing pronunciation

Cartesia TTS allows you to customize pronunciation using Speech Synthesis Markup Language (SSML). To learn more, see Specify Custom Pronunciations.

Transcription timing

Cartesia TTS supports aligned transcription forwarding, which improves transcription synchronization in your frontend. Set use_tts_aligned_transcript=True in your AgentSession configuration to enable this feature. To learn more, see the docs.

Plugin

Use the Cartesia plugin to connect directly to Cartesia's TTS API with your own API key.

Available in

Python

Node.js

Installation

Install the plugin from PyPI:

uv add "livekit-agents[cartesia]~=1.4"

pnpm add @livekit/agents-plugin-cartesia@1.x

Authentication

The Cartesia plugin requires a Cartesia API key.

Set CARTESIA_API_KEY in your .env file.

Usage

Use Cartesia TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.

from livekit.plugins import cartesia

session = AgentSession(
   tts=cartesia.TTS(
      model="sonic-3",
      voice="f786b574-daa5-4673-aa0c-cbe3e8534c02",
   )
   # ... llm, stt, etc.
)

import * as cartesia from '@livekit/agents-plugin-cartesia';

const session = new voice.AgentSession({
    tts: cartesia.TTS(
        model: "sonic-3",
        voice: "f786b574-daa5-4673-aa0c-cbe3e8534c02",
    ),
    // ... llm, stt, etc.
});

Parameters

This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.

modelstringOptionalDefault: sonic-3

ID of the model to use for generation. See supported models.

voicestring | list[float]OptionalDefault: f786b574-daa5-4673-aa0c-cbe3e8534c02

ID of the voice to use for generation, or an embedding array. See official documentation.

languagestringOptionalDefault: en

Language of input text in ISO-639-1 format. For a list of languages supported by model, see supported models.

emotionstringOptional

See Emotion Controls for Sonic 3 for supported values.

speedfloatOptionalDefault: 1

Speed of the speech, where 1.0 is the default speed. See Speed and Volume Controls for Sonic 3 for more information.

volumefloatOptionalDefault: 1

Volume of the speech, where 1.0 is the default volume. See Speed and Volume Controls for Sonic 3 for more information.

Additional resources

The following resources provide more information about using Cartesia with LiveKit Agents.

Python plugin

Reference GitHub PyPI

Node.js plugin

Reference GitHub NPM

Cartesia docs

Cartesia TTS docs.

Voice AI quickstart

Get started with LiveKit Agents and Cartesia TTS.

Cartesia STT

Guide to the Cartesia STT plugin with LiveKit Agents.