Text-to-speech (TTS) models overview

Overview

Voice agent speech is produced by a TTS model, configured with a voice profile that specifies tone, accent, and other qualitative characteristics of the speech. The TTS model runs on output from an LLM model to speak the agent response to the user.

You can choose a voice model served through LiveKit Inference, included with LiveKit Cloud. With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you prefer to manage billing and rate limits yourself, or need access to a provider not currently available through LiveKit Inference.

LiveKit Inference

LiveKit Inference provides a curated set of TTS models with managed billing and rate limits. Each model comes with a limited selection of Suggested voices, plus a wider selection through each provider's documentation.

Models

The following models are available in LiveKit Inference. Refer to the guide for each model for more details on additional configuration options.

Model family	Model name	Model ID	Languages
Cartesia	Sonic 2	cartesia/sonic-2	enfrdeesptzhjako
	Sonic 3	cartesia/sonic-3	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3 (2025-10-27)	cartesia/sonic-3-2025-10-27	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3 (2026-01-12)	cartesia/sonic-3-2026-01-12	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3 Latest	cartesia/sonic-3-latest	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3.5	cartesia/sonic-3.5	endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3.5 (2026-05-04)	cartesia/sonic-3.5-2026-05-04	endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic Latest	cartesia/sonic-latest	endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic Turbo	cartesia/sonic-turbo	enfrdeesptzhjahiko
	Sonic Retired	cartesia/sonic	enfrdeesptzhjahiitkonlplrusvtr
Deepgram	Aura-2	deepgram/aura-2	enen-USen-PHen-GBen-AUeses-COes-MXes-ESes-419es-ARnlnl-NLfrfr-FRdede-DEitit-ITjaja-JP
Deepgram	Aura-1 Retired	deepgram/aura	enen-USen-IEen-GB
Fish Audio	S2 Pro	fishaudio/s2-pro	enzhjadefreskoarrunlitplpt
	S2.1 Pro	fishaudio/s2.1-pro	enzhjadefreskoarrunlitplpt
	S2.1 Pro Free	fishaudio/s2.1-pro-free	enzhjadefreskoarrunlitplpt
Inworld	Realtime TTS 1.5 Max	inworld/inworld-tts-1.5-max	enzhjakoruitesptfrdeplnlhihear
	Realtime TTS 1.5 Mini	inworld/inworld-tts-1.5-mini	enzhjakoruitesptfrdeplnlhihear
	Realtime TTS 2.0	inworld/inworld-tts-2	enzhjakoruitesptfrdeplnlhihear
	Realtime TTS Retired	inworld/inworld-tts-1	enesfrkonlzhdeitjaplptruhihear
	Realtime TTS Max Retired	inworld/inworld-tts-1-max	enesfrkonlzhdeitjaplptruhihear
Rime	Arcana	rime/arcana	enesfrdehihejaptar
	Coda	rime/coda	enesfrdeptja
	Mist	rime/mist	en
	Mist v2	rime/mistv2	enesfrde
	Mist v3	rime/mistv3	enesfrdehi
xAI	Text to Speech	xai/tts-1	autoenar-EGar-SAar-AEbnzhfrdehiiditjakopt-BRpt-PTrues-MXes-EStrvi
ElevenLabs	Eleven Flash v2 Deprecated	elevenlabs/eleven_flash_v2	en
	Eleven Flash v2.5 Deprecated	elevenlabs/eleven_flash_v2_5	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
	Eleven Multilingual v2 Deprecated	elevenlabs/eleven_multilingual_v2	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukru
	Eleven Turbo v2 Deprecated	elevenlabs/eleven_turbo_v2	en
	Eleven Turbo v2.5 Deprecated	elevenlabs/eleven_turbo_v2_5	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
	Eleven v3 Deprecated	elevenlabs/eleven_v3	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi

Retired models

Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Custom voices

You can create and use custom voices with LiveKit Inference. Upload or record a sample, and LiveKit clones it to all supported TTS providers on your plan. You can then use the clone in your agent sessions with any of those providers.

Custom voices

Create voice clones from short audio samples.

Suggested voices

The following voices are good choices for overall quality and performance. Each provider has a much larger selection of voices to choose from, which you can find in their documentation. In addition to the voices below, you can choose to use other TTS provider voices through LiveKit Inference.

Click the copy icon to copy the voice ID to use in your agent session.

Plugins

The LiveKit Agents framework includes open source plugins for a wide range of TTS providers. Use a plugin when you need provider-specific features not available through Inference, want to manage billing directly, or need a provider not currently in Inference. Plugins require your own API key and account.

Provider	Python	Node.js
Amazon Polly	✓	—
AsyncAI	✓	—
Azure AI Speech	✓	—
Azure OpenAI	✓	—
Baseten	✓	—
Camb.ai	✓	—
Cartesia	✓	✓
Deepgram	✓	✓
ElevenLabs	✓	✓
Fish Audio	✓	—
Gemini	✓	—
Gnani	✓	—
Google Cloud	✓	—
Gradium	✓	—
Groq	✓	—
Hume	✓	—
Inworld	✓	✓
Kokoro	✓	—
LMNT	✓	—
MiniMax	✓	—
Mistral AI	✓	✓
Murf AI	✓	—
Neuphonic	✓	✓
Nvidia	✓	—
OpenAI	✓	✓
Resemble AI	✓	✓
Respeecher	✓	—
Rime	✓	✓
Sarvam	✓	✓
Simplismart	✓	—
SLNG	✓	—
Smallest AI	✓	—
Soniox	✓	—
Speechify	✓	—
Speechmatics	✓	—
Spitch	✓	—
xAI	✓	—

Have another provider in mind? LiveKit is open source and welcomes new plugin contributions.

TTS Usage

To set up TTS in an AgentSession, provide a descriptor with both the desired model and voice. LiveKit Inference manages the connection to the model automatically. Consult the Suggested voices list for suggested voices, or view the model reference for more voices.

from livekit.agents import AgentSession, inference

session = AgentSession(
    tts=inference.TTS(
        model="inworld/inworld-tts-2",
        voice="Ashley",
        language="en",
    ),
    # ... llm, stt, etc.
)

import { AgentSession, inference } from '@livekit/agents';

const session = new AgentSession({
    tts: new inference.TTS({
        model: "inworld/inworld-tts-2",
        voice: "Ashley",
        language: "en",
    }),
    // ... llm, stt, etc.
})

Additional parameters

More configuration options, such as custom pronunciation, are available for each model. To set additional parameters, use the TTS class from the inference module. Consult each model reference for examples and available parameters.

Language codes

All TTS plugins and LiveKit Inference use the LanguageCode type for the language parameter. LanguageCode accepts any common language format and normalizes it automatically to BCP-47 . You don't need to look up the specific format each provider expects — pass any of the following and the framework handles the conversion:

ISO 639-1: "en", "es", "fr"
BCP-47 with region: "en-US", "zh-Hans-CN"
ISO 639-3: "eng", "spa"
Language names: "english", "spanish"
Underscored variants: "en_us" (normalized to "en-US")

For example, all of the following are equivalent:

from livekit.agents import LanguageCode

LanguageCode("english")  # → "en"
LanguageCode("eng")      # → "en"
LanguageCode("en")       # → "en"
LanguageCode("en-US")    # → "en-US"
LanguageCode("en_us")    # → "en-US"

LanguageCode is a str subclass, so you can use it anywhere a string is expected. It also provides properties for extracting parts of the code:

.language: Base ISO 639-1 code (e.g., "en" from "en-US").
.region: Region subtag, if present (e.g., "US" from "en-US").
.iso: ISO 639-1 tag with region (e.g., "zh-CN" from "cmn-Hans-CN").

import { normalizeLanguage } from '@livekit/agents';

normalizeLanguage("english")  // → "en"
normalizeLanguage("eng")      // → "en"
normalizeLanguage("en")       // → "en"
normalizeLanguage("en-US")    // → "en-US"
normalizeLanguage("en_us")    // → "en-US"

In Node.js, LanguageCode is a branded string type. Use normalizeLanguage() to convert a plain string to a LanguageCode, and the standalone helper functions to extract parts of the code:

getBaseLanguage(lang): Base ISO 639-1 code (e.g., "en" from "en-US").
getLanguageRegion(lang): Region subtag, if present (e.g., "US" from "en-US").
getIsoLanguage(lang): ISO 639-1 tag with region (e.g., "zh-CN" from "cmn-Hans-CN").

Custom TTS

To create an entirely custom TTS, implement the TTS node in your agent.

Standalone TTS

You can use a TTS instance as a standalone component by creating a stream. Use push_text to add text to the stream, and then consume a stream of SynthesizedAudio to publish as realtime audio to another participant.

Here is an example of a standalone TTS app:

import asyncio
from livekit import agents, rtc
from livekit.agents import AgentServer
from livekit.agents.tts import SynthesizedAudio
from livekit.plugins import cartesia
from typing import AsyncIterable


server = AgentServer()

@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    text_stream: AsyncIterable[str] = ... # you need to provide a stream of text
    audio_source = rtc.AudioSource(44100, 1)

    track = rtc.LocalAudioTrack.create_audio_track("agent-audio", audio_source)
    await ctx.room.local_participant.publish_track(track)

    tts = cartesia.TTS(model="sonic-english")
    tts_stream = tts.stream()

    # create a task to consume and publish audio frames
    asyncio.create_task(send_audio(tts_stream))

    # push text into the stream, TTS stream will emit audio frames along with events
    # indicating sentence (or segment) boundaries.
    async for text in text_stream:
        tts_stream.push_text(text)
    tts_stream.end_input()

    async def send_audio(audio_stream: AsyncIterable[SynthesizedAudio]):
        async for a in audio_stream:
            await audio_source.capture_frame(a.audio.frame)

if __name__ == "__main__":
    agents.cli.run_app(server)

Additional resources

The following resources cover related topics that may be useful for your application.

Agent speech docs

Explore the speech capabilities and features of LiveKit Agents.

Pipeline nodes

Learn how to customize the behavior of your agent by overriding nodes in the voice pipeline.

Inference pricing

The latest pricing information for TTS models in LiveKit Inference.

Overview

LiveKit Inference

Models

Custom voices

Custom voices

Suggested voices

Plugins

TTS Usage

Additional parameters

Language codes

Custom TTS

Standalone TTS

Additional resources

Agent speech docs

Pipeline nodes

Inference pricing

Ask LiveKit