Azure Speech STT integration guide

How to use the Azure Speech STT plugin for LiveKit Agents.

Overview

Azure Speech provides a streaming STT service with high accuracy, realtime transcription. You can use the open source Azure Speech plugin for LiveKit Agents to build voice AI with fast, accurate transcription.

Quick reference

This section provides a brief overview of the Azure Speech STT plugin. For more information, see Additional resources.

Installation

Install the plugin from PyPI:

pip install "livekit-agents[azure]~=1.0"

Authentication

The Azure Speech plugin requires an Azure Speech key.

Set the following environment variables in your .env file:

AZURE_SPEECH_KEY=<azure-speech-key>
AZURE_SPEECH_REGION=<azure-speech-region>
AZURE_SPEECH_HOST=<azure-speech-host>

Usage

Use Azure Speech STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import azure
azure_stt = stt.STT(
speech_key="<speech_service_key>",
speech_region="<speech_service_region>",
)
Note

To create an instance of azure.STT, one of the following options must be met:

  • speech_host must be set, or
  • speech_key and speech_region must both be set, or
  • speech_auth_token and speech_region must both be set

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

speech_keystringOptionalEnv: AZURE_SPEECH_KEY

Azure Speech speech-to-text key. To learn more, see Azure Speech prerequisites.

speech_regionstringOptionalEnv: AZURE_SPEECH_REGION

Azure Speech speech-to-text region. To learn more, see Azure Speech prerequisites.

speech_hoststringOptionalEnv: AZURE_SPEECH_HOST

Azure Speech endpoint.

speech_auth_tokenstringOptional

Azure Speech authentication token.

languageslist[string]Optional

List of potential source languages. To learn more, see Standard locale names.

Additional resources

The following resources provide more information about using Azure Speech with LiveKit Agents.