Create a new agent in your browser using this model
Overview
Cartesia text-to-speech is available in LiveKit Agents through LiveKit Inference and the Cartesia plugin. With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you want to manage your own billing and rate limits. Pricing for LiveKit Inference is available on the pricing page.
LiveKit Inference
Use LiveKit Inference to access Cartesia TTS without a separate Cartesia API key.
| Model name | Model ID | Languages |
|---|---|---|
Sonic | cartesia/sonic | enfrdeesptzhjahiitkonlplrusvtr |
Sonic 2 | cartesia/sonic-2 | enfrdeesptzhjahiitkonlplrusvtr |
Sonic 3 | cartesia/sonic-3 | endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa |
Sonic 3 (2025-10-27) | cartesia/sonic-3-2025-10-27 | endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa |
Sonic 3 (2026-01-12) | cartesia/sonic-3-2026-01-12 | endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa |
Sonic Turbo | cartesia/sonic-turbo | enfrdeesptzhjahiitkonlplrusvtr |
Usage
To use Cartesia, use the TTS class from the inference module:
from livekit.agents import AgentSession, inferencesession = AgentSession(tts=inference.TTS(model="cartesia/sonic-3",voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",language="en",extra_kwargs={"speed": 1.5,"volume": 1.2,"emotion": "excited"}),# ... tts, stt, vad, turn_handling, etc.)
import { AgentSession, inference } from '@livekit/agents';session = new AgentSession({tts: new inference.TTS({model: "cartesia/sonic-3",voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",language: "en",modelOptions: {speed: 1.5,volume: 1.2,emotion: "excited"}}),// ... tts, stt, vad, turnHandling, etc.});
Parameters
modelstringThe model ID from the models list.
voicestringSee voices for guidance on selecting a voice.
languageLanguageCodeLanguage code for the input text. If not set, the model default applies.
extra_kwargsdictAdditional parameters to pass to the Cartesia TTS API. See model parameters for supported fields.
In Node.js this parameter is called modelOptions.
Model parameters
Pass the following parameters inside extra_kwargs (Python) or modelOptions (Node.js):
| Parameter | Type | Default | Notes |
|---|---|---|---|
emotion | str | Emotion control string. See Emotion Controls for supported values. | |
speed | "slow" | "normal" | "fast" | float | Speed of speech. Either a preset string or a numeric multiplier. See Speed and Volume Controls for more information. | |
volume | float | Volume of speech. See Speed and Volume Controls for more information. | |
duration | float | Target duration in seconds for the generated audio. | |
max_buffer_delay_ms | int | Maximum buffer delay in milliseconds before flushing a chunk. | |
add_timestamps | bool | Whether to include word-level timestamps in the response. | |
add_phoneme_timestamps | bool | Whether to include phoneme-level timestamps in the response. | |
use_normalized_timestamps | bool | Whether to return timestamps in normalized form. |
Voices
LiveKit Inference supports all of the default "Cartesia Voices" available in the Cartesia API. You can explore the available voices in the Cartesia voice library (free account required), and use the voice by copying its ID into your LiveKit agent session.
Custom Cartesia voices, including voice cloning, are not yet supported in LiveKit Inference. To use custom voices, create your own Cartesia account and use the Cartesia plugin for LiveKit Agents instead.
The following is a small sample of the Cartesia voices available in LiveKit Inference.
String descriptors
As a shortcut, you can also pass a descriptor with the model ID and voice directly to the tts argument in your AgentSession:
from livekit.agents import AgentSessionsession = AgentSession(tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",# ... llm, stt, vad, turn_handling, etc.)
import { AgentSession } from '@livekit/agents';session = new AgentSession({tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",// ... llm, stt, vad, turnHandling, etc.});
Plugin
LiveKit's plugin support for Cartesia lets you connect directly to Cartesia's TTS API with your own API key.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[cartesia]~=1.4"
pnpm add @livekit/agents-plugin-cartesia@1.x
Authentication
The Cartesia plugin requires a Cartesia API key.
Set CARTESIA_API_KEY in your .env file.
Usage
Use Cartesia TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.
from livekit.plugins import cartesiasession = AgentSession(tts=cartesia.TTS(model="sonic-3",voice="f786b574-daa5-4673-aa0c-cbe3e8534c02",)# ... llm, stt, etc.)
import * as cartesia from '@livekit/agents-plugin-cartesia';const session = new voice.AgentSession({tts: cartesia.TTS(model: "sonic-3",voice: "f786b574-daa5-4673-aa0c-cbe3e8534c02",),// ... llm, stt, etc.});
Parameters
This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.
modelstringDefault: sonic-3ID of the model to use for generation. See supported models.
voicestring | list[float]Default: f786b574-daa5-4673-aa0c-cbe3e8534c02ID of the voice to use for generation, or an embedding array. See official documentation.
languageLanguageCodeDefault: enLanguage code for the input text. For a list of languages supported by model, see supported models.
emotionstringSee Emotion Controls for Sonic 3 for supported values.
speedfloatDefault: 1Speed of the speech, where 1.0 is the default speed. See Speed and Volume Controls for Sonic 3 for more information.
volumefloatDefault: 1Volume of the speech, where 1.0 is the default volume. See Speed and Volume Controls for Sonic 3 for more information.
Customizing pronunciation
Cartesia TTS allows you to customize pronunciation using Speech Synthesis Markup Language (SSML). To learn more, see Specify Custom Pronunciations.
Transcription timing
Cartesia TTS supports aligned transcription forwarding, which improves transcription synchronization in your frontend. Set use_tts_aligned_transcript=True in your AgentSession configuration to enable this feature. To learn more, see the docs.
Additional resources
The following resources provide more information about using Cartesia with LiveKit Agents.