Cartesia integration guide

Chat with a voice assistant built with LiveKit and Cartesia TTS

Overview

Cartesia provides customizable speech synthesis (TTS) across a number of different languages and produces natural-sounding speech with low latency. With LiveKit's Cartesia integration and the Agents framework, you can build AI voice applications that sound realistic. For a demonstration of what you can build, try out the LiveKit voice assistant with Cartesia.

Note

If you're looking to build an AI voice assistant with Cartesia, check out our Voice Agent Quickstart guide and use the Cartesia TTS module as demonstrated below.

Quick reference

Environment variables

CARTESIA_API_KEY=<your-cartesia-api-key>

TTS

LiveKit's Cartesia integration provides a text-to-speech (TTS) interface. This can be used in a VoicePipelineAgent or as a standalone speech generator. For a complete reference of all available parameters, see the plugin reference.

Usage

from livekit.plugins.cartesia import tts

cartesia_tts = tts.TTS(
  model="sonic-english",
  voice="c2ac25f9-ecc4-4f56-9095-651354df60c0",
  speed=0.8,
  emotion=["curiosity:highest", "positivity:high"]
)

Parameters

modelstringOptionalDefault: sonic

ID of the model to use for generation. See supported models.

voicestring | list[float]OptionalDefault: c2ac25f9-ecc4-4f56-9095-651354df60c0

ID of the voice to use for generation, or an embedding array. See official documentation.

speedstring | floatOptionalDefault: 1.0

Speed of generated speech. Either a float in range [-1.0, 1.0], or one of "fastest", "fast", "normal", "slow", "slowest". See speed options.

emotionlist[string]OptionalDefault: neutral

Emotion of generated speech. See emotion options.

languagestringOptionalDefault: en

Language of input text in ISO-639-1 format. For a list of languages support by model, see supported models.