Chat with a voice assistant built with LiveKit and Cartesia Sonic-3 TTS
Overview
This plugin allows you to use Cartesia as a TTS provider for your voice agents.
Cartesia TTS is also available in LiveKit Inference, with billing and integration handled automatically. See the docs for more information.
Quick reference
This section includes a brief overview of the Cartesia TTS plugin. For more information, see Additional resources.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[cartesia]~=1.2"
pnpm add @livekit/agents-plugin-cartesia@1.x
Authentication
The Cartesia plugin requires a Cartesia API key.
Set CARTESIA_API_KEY in your .env file.
Usage
Use Cartesia TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.
from livekit.plugins import cartesiasession = AgentSession(tts=cartesia.TTS(model="sonic-3",voice="f786b574-daa5-4673-aa0c-cbe3e8534c02",)# ... llm, stt, etc.)
import * as cartesia from '@livekit/agents-plugin-cartesia';const session = new voice.AgentSession({tts: cartesia.TTS(model: "sonic-3",voice: "f786b574-daa5-4673-aa0c-cbe3e8534c02",),// ... llm, stt, etc.});
Parameters
This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.
ID of the model to use for generation. See supported models.
ID of the voice to use for generation, or an embedding array. See official documentation.
Language of input text in ISO-639-1 format. For a list of languages support by model, see supported models.
See Emotion Controls for Sonic 3 for supported values.
Speed of the speech, where 1.0 is the default speed. See Speed and Volume Controls for Sonic 3 for more information.
Volume of the speech, where 1.0 is the default volume. See Speed and Volume Controls for Sonic 3 for more information.
Customizing pronunciation
Cartesia TTS allows you to customize pronunciation using Speech Synthesis Markup Language (SSML). To learn more, see Specify Custom Pronunciations.
Transcription timing
Cartesia TTS supports aligned transcription forwarding, which improves transcription synchronization in your frontend. Set use_tts_aligned_transcript=True in your AgentSession configuration to enable this feature. To learn more, see the docs.
Additional resources
The following resources provide more information about using Cartesia with LiveKit Agents.