Azure Speech TTS integration guide

How to use the Azure Speech TTS plugin for LiveKit Agents.

Overview

Azure Speech provides a streaming TTS service with high accuracy, realtime transcription. You can use the open source Azure Speech plugin for LiveKit Agents to build voice AI with fast, accurate transcription.

Quick reference

This section provides a brief overview of the Azure Speech TTS plugin. For more information, see Additional resources.

Installation

Install the plugin from PyPI:

pip install "livekit-agents[azure]~=1.0rc"

Authentication

The Azure Speech plugin requires an Azure Speech key.

Set the following environment variables in your .env file:

AZURE_SPEECH_KEY=<azure-speech-key>
AZURE_SPEECH_REGION=<azure-speech-region>
AZURE_SPEECH_HOST=<azure-speech-host>

Usage

Use an Azure Speech TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.

from livekit.plugins import azure
session = AgentSession(
tts=azure.TTS(
speech_key="<speech_service_key>",
speech_region="<speech_service_region>",
),
# ... llm, stt, etc.
)
Note

To create an instance of azure.TTS, one of the following options must be met:

  • speech_host must be set, or
  • speech_key and speech_region must both be set, or
  • speech_auth_token and speech_region must both be set.

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

voicestringOptional

Voice for text-to-speech. To learn more, see Select synthesis language and voice.

languagestringOptional

Language of the input text. To learn more, see Select synthesis language and voice.

prosodyProsodyConfigOptional

Specify changes to pitch, rate, and volume for the speech output. To learn more, see Adjust prosody.

speech_keystringOptionalEnv: AZURE_SPEECH_KEY

Azure Speech speech-to-text key. To learn more, see Azure Speech prerequisites.

speech_regionstringOptionalEnv: AZURE_SPEECH_REGION

Azure Speech speech-to-text region. To learn more, see Azure Speech prerequisites.

speech_hoststringOptionalEnv: AZURE_SPEECH_HOST

Azure Speech endpoint.

speech_auth_tokenstringOptional

Azure Speech authentication token.

Controlling speech and pronunciation

Azure Speech TTS supports Speech Synthesis Markup Language (SSML) for customizing generated speech. To learn more, see SSML overview.

Additional resources

The following resources provide more information about using Azure Speech with LiveKit Agents.