Skip to main content

Text-to-speech (TTS) models overview

Voices and plugins to add realtime speech to your voice agents.

Overview

Voice agent speech is produced by a TTS model, configured with a voice profile that specifies tone, accent, and other qualitative characteristics of the speech. The TTS model runs on output from an LLM model to speak the agent response to the user.

You can choose a voice model served through LiveKit Inference, included with LiveKit Cloud. With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you prefer to manage billing and rate limits yourself, or need access to a provider not currently available through LiveKit Inference.

LiveKit Inference

LiveKit Inference provides a curated set of TTS models with managed billing and rate limits. Each model comes with a limited selection of Suggested voices, plus a wider selection through each provider's documentation.

Models

The following models are available in LiveKit Inference. Refer to the guide for each model for more details on additional configuration options.

ProviderModel nameModel IDLanguages
Sonic 2
cartesia/sonic-2
8 languages
Sonic 3
cartesia/sonic-3
42 languages
Sonic 3 (2025-10-27)
cartesia/sonic-3-2025-10-27
42 languages
Sonic 3 (2026-01-12)
cartesia/sonic-3-2026-01-12
42 languages
Sonic 3 Latest
cartesia/sonic-3-latest
42 languages
Sonic 3.5
cartesia/sonic-3.5
40 languages
Sonic 3.5 (2026-05-04)
cartesia/sonic-3.5-2026-05-04
40 languages
Sonic Latest
cartesia/sonic-latest
40 languages
Sonic Turbo
cartesia/sonic-turbo
9 languages
Sonic
Deprecated
cartesia/sonic
15 languages
Aura-2
deepgram/aura-2
7 languages
Aura-1
Retired
deepgram/aura
English only
Eleven Flash v2
elevenlabs/eleven_flash_v2
English only
Eleven Flash v2.5
elevenlabs/eleven_flash_v2_5
32 languages
Eleven Multilingual v2
elevenlabs/eleven_multilingual_v2
29 languages
Eleven Turbo v2
elevenlabs/eleven_turbo_v2
English only
Eleven Turbo v2.5
elevenlabs/eleven_turbo_v2_5
32 languages
Eleven v3
elevenlabs/eleven_v3
32 languages
Realtime TTS 1.5 Max
inworld/inworld-tts-1.5-max
15 languages
Realtime TTS 1.5 Mini
inworld/inworld-tts-1.5-mini
15 languages
Realtime TTS 2.0
inworld/inworld-tts-2
15 languages
Realtime TTS
Deprecated
inworld/inworld-tts-1
15 languages
Realtime TTS Max
Deprecated
inworld/inworld-tts-1-max
15 languages
Arcana
rime/arcana
9 languages
Coda
rime/coda
6 languages
Mist
rime/mist
English only
Mist v2
rime/mistv2
4 languages
Mist v3
rime/mistv3
5 languages
Text to Speech
xai/tts-1
17 languages
Retired models
Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Custom voices

You can create and use custom voices with LiveKit Inference. Upload or record a sample, and LiveKit clones it to all supported TTS providers on your plan. You can then use the clone in your agent sessions with any of those providers.

Custom voices

Create voice clones from short audio samples.

Suggested voices

The following voices are good choices for overall quality and performance. Each provider has a much larger selection of voices to choose from, which you can find in their documentation. In addition to the voices below, you can choose to use other TTS provider voices through LiveKit Inference.

Click the copy icon to copy the voice ID to use in your agent session.

Blake

Energetic American adult male

English (United States)
Daniela

Calm and trusting Mexican female

Spanish (Mexico)
Jacqueline

Confident, young American adult female

English (United States)
Robyn

Neutral, mature Australian female

English (Australia)
Apollo

Comfortable, casual male

English (United States)
Athena

Smooth, professional female

English (United States)
Odysseus

Calm, professional male

English (United States)
Theia

Expressive, polite female

English (Australia)
Alice

Clear and engaging, friendly British woman

English (United Kingdom)
Chris

Natural and real American male

English (United States)
Eric

A smooth tenor Mexican male

Spanish (Mexico)
Jessica

Young and popular, playful American female

English (United States)
Astra

Chipper, upbeat American female

English (United States)
Celeste

Chill Gen-Z American female

English (United States)
Luna

Chill but excitable American female

English (United States)
Ursa

Young, emo American male

English (United States)
Ashley

Warm, natural American female

English (United States)
Diego

Soothing, gentle Mexican male

Spanish (Mexico)
Edward

Fast-talking, emphatic American male

English (United States)
Olivia

Upbeat, friendly British female

English (United Kingdom)
Ara

Warm, friendly

English (United States)
Eve

Energetic, upbeat

English (United States)
Leo

Authoritative, strong

English (United States)
Rex

Confident, clear

English (United States)

Plugins

The LiveKit Agents framework includes open source plugins for a wide range of TTS providers. Use a plugin when you need provider-specific features not available through Inference, want to manage billing directly, or need a provider not currently in Inference. Plugins require your own API key and account.

ProviderPythonNode.js

Have another provider in mind? LiveKit is open source and welcomes new plugin contributions.

TTS Usage

To set up TTS in an AgentSession, provide a descriptor with both the desired model and voice. LiveKit Inference manages the connection to the model automatically. Consult the Suggested voices list for suggested voices, or view the model reference for more voices.

from livekit.agents import AgentSession, inference
session = AgentSession(
tts=inference.TTS(
model="cartesia/sonic-3",
voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
language="en",
),
# ... llm, stt, etc.
)
import { AgentSession, inference } from '@livekit/agents';
const session = new AgentSession({
tts: new inference.TTS({
model: "cartesia/sonic-3",
voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
language: "en",
}),
// ... llm, stt, etc.
})

Additional parameters

More configuration options, such as custom pronunciation, are available for each model. To set additional parameters, use the TTS class from the inference module. Consult each model reference for examples and available parameters.

Language codes

All TTS plugins and LiveKit Inference use the LanguageCode type for the language parameter. LanguageCode accepts any common language format and normalizes it automatically to BCP-47 . You don't need to look up the specific format each provider expects — pass any of the following and the framework handles the conversion:

  • ISO 639-1: "en", "es", "fr"
  • BCP-47 with region: "en-US", "zh-Hans-CN"
  • ISO 639-3: "eng", "spa"
  • Language names: "english", "spanish"
  • Underscored variants: "en_us" (normalized to "en-US")

For example, all of the following are equivalent:

from livekit.agents import LanguageCode
LanguageCode("english") # → "en"
LanguageCode("eng") # → "en"
LanguageCode("en") # → "en"
LanguageCode("en-US") # → "en-US"
LanguageCode("en_us") # → "en-US"

LanguageCode is a str subclass, so you can use it anywhere a string is expected. It also provides properties for extracting parts of the code:

  • .language: Base ISO 639-1 code (e.g., "en" from "en-US").
  • .region: Region subtag, if present (e.g., "US" from "en-US").
  • .iso: ISO 639-1 tag with region (e.g., "zh-CN" from "cmn-Hans-CN").
import { normalizeLanguage } from '@livekit/agents';
normalizeLanguage("english") // → "en"
normalizeLanguage("eng") // → "en"
normalizeLanguage("en") // → "en"
normalizeLanguage("en-US") // → "en-US"
normalizeLanguage("en_us") // → "en-US"

In Node.js, LanguageCode is a branded string type. Use normalizeLanguage() to convert a plain string to a LanguageCode, and the standalone helper functions to extract parts of the code:

  • getBaseLanguage(lang): Base ISO 639-1 code (e.g., "en" from "en-US").
  • getLanguageRegion(lang): Region subtag, if present (e.g., "US" from "en-US").
  • getIsoLanguage(lang): ISO 639-1 tag with region (e.g., "zh-CN" from "cmn-Hans-CN").

Custom TTS

To create an entirely custom TTS, implement the TTS node in your agent.

Standalone TTS

You can use a TTS instance as a standalone component by creating a stream. Use push_text to add text to the stream, and then consume a stream of SynthesizedAudio to publish as realtime audio to another participant.

Here is an example of a standalone TTS app:

import asyncio
from livekit import agents, rtc
from livekit.agents import AgentServer
from livekit.agents.tts import SynthesizedAudio
from livekit.plugins import cartesia
from typing import AsyncIterable
server = AgentServer()
@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
text_stream: AsyncIterable[str] = ... # you need to provide a stream of text
audio_source = rtc.AudioSource(44100, 1)
track = rtc.LocalAudioTrack.create_audio_track("agent-audio", audio_source)
await ctx.room.local_participant.publish_track(track)
tts = cartesia.TTS(model="sonic-english")
tts_stream = tts.stream()
# create a task to consume and publish audio frames
asyncio.create_task(send_audio(tts_stream))
# push text into the stream, TTS stream will emit audio frames along with events
# indicating sentence (or segment) boundaries.
async for text in text_stream:
tts_stream.push_text(text)
tts_stream.end_input()
async def send_audio(audio_stream: AsyncIterable[SynthesizedAudio]):
async for a in audio_stream:
await audio_source.capture_frame(a.audio.frame)
if __name__ == "__main__":
agents.cli.run_app(server)

Additional resources

The following resources cover related topics that may be useful for your application.