ElevenLabs TTS integration guide

How to use the ElevenLabs TTS plugin for LiveKit Agents.

Overview

ElevenLabs provides an AI text-to-speech (TTS) service with thousands of human-like voices across a number of different languages. With LiveKit's ElevenLabs integration and the Agents framework, you can build voice AI applications that sound realistic.

Quick reference

This section provides a quick reference for the ElevenLabs TTS plugin. For more information, see Additional resources.

Installation

Install the plugin from PyPI:

pip install "livekit-agents[elevenlabs]~=1.0"

Authentication

The ElevenLabs plugin requires an ElevenLabs API key.

Set ELEVEN_API_KEY in your .env file.

Usage

Use ElevenLabs TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.

from livekit.plugins import elevenlabs
session = AgentSession(
tts=elevenlabs.TTS(
voice_id="ODq5zmih8GrVes37Dizd",
model="eleven_multilingual_v2"
)
# ... llm, stt, etc.
)

Parameters

This section describes some of the parameters you can set when you create an ElevenLabs TTS. See the plugin reference for a complete list of all available parameters.

modelstringOptionalDefault: eleven_flash_v2_5

ID of the model to use for generation. To learn more, see the ElevenLabs documentation.

voice_idstringOptionalDefault: EXAVITQu4vr4xnSDxMaL

ID of the voice to use for generation. To learn more, see the ElevenLabs documentation.

voice_settingsVoiceSettingsOptional

Voice configuration. To learn more, see the ElevenLabs documentation.

  • stabilityfloatOptional
  • similarity_boostfloatOptional
  • stylefloatOptional
  • use_speaker_boostboolOptional
  • speedfloatOptional
languagestringOptionalDefault: en

Language of output audio in ISO-639-1 format. To learn more, see the ElevenLabs documentation.

streaming_latencyintOptionalDefault: 3

Latency in seconds for streaming.

enable_ssml_parsingboolOptionalDefault: false

Enable Speech Synthesis Markup Language (SSML) parsing for input text. Set to true to customize pronunciation using SSML.

chunk_length_schedulelist[int]OptionalDefault: [80, 120, 200, 260]

Schedule for chunk lengths. Valid values range from 50 to 500.

Customizing pronunciation

ElevenLabs supports custom pronunciation for specific words or phrases with SSML phoneme tags. This is useful to ensure correct pronunciation of certain words, even when missing from the voice's lexicon. To learn more, see Pronunciation.

Additional resources

The following resources provide more information about using ElevenLabs with LiveKit Agents.

Voice AI quickstart

Get started with LiveKit Agents and ElevenLabs TTS.