Azure integration guide

An introduction to using the LiveKit Agents framework with Azure AI Services.

Overview

LiveKit's Azure integration provides support for multiple Azure AI Services. These include Azure OpenAI, Speech service for STT and TTS, and Realtime API. Azure OpenAI allows you to run OpenAI models using the security capabilities of Microsoft Azure. The Speech service's STT and TTS allow you to transcribe speech to text with high accuracy and produce natural-sounding text-to-speech voices. Azure's Realtime API processes user input and responds immediately, allowing you to create agents that sound naturally responsive.

LiveKit provides multiple integration paths for using Azure AI Services for building agents:

  • OpenAI plugin support for Azure OpenAI LLM and TTS.
  • Azure plugin for Speech service STT and TTS.
  • OpenAI plugin support for Azure AI Services Realtime API.

You can use Azure STT, TTS, and LLM to create agents using the VoicePipelineAgent class. To use the Realtime API, you can create an agent using the MultimodalAgent class.

Quick reference

The following sections provide a quick reference for integrating Azure AI Services with LiveKit. For the complete reference, see the links provided in each section.

Azure OpenAI LLM

LiveKit's Azure integration provides an OpenAI compatible LLM interface. This can be used as the LLM for an agent created using the VoicePipelineAgent class.

Azure's OpenAI compatible API needs to be configured to connect to OpenAI. You can set the environment variables listed in the usage section or pass in these values when you create the LLM instance.

LLM.with_azure usage

Use the with_azure method to create an instance of an Azure OpenAI LLM:

from livekit.plugins.openai import LLM
azure_llm = LLM.with_azure(
model="gpt-4o",
temperature=0.8,
)

LLM.with_azure parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

modelstringOptionalDefault: gpt-4o

ID of the model to use for inference. To learn more, see supported models.

azure_endpointstringOptionalEnv: AZURE_OPENAI_ENDPOINT

Azure OpenAI endpoint in the following format: https://{your-resource-name}.openai.azure.com.

azure_deploymentstringOptional

Name of your model deployment.

api_versionstringOptionalEnv: OPENAI_API_VERSION

OpenAI REST API version used for the request.

api_keystringOptionalEnv: AZURE_OPENAI_API_KEY

Azure OpenAI API key.

azure_ad_tokenstringOptionalEnv: AZURE_OPENAI_AD_TOKEN

Azure Active Directory token.

azure_ad_token_providerstringOptional

Function that returns an Azure Active Directory token.

organizationstringOptionalEnv: OPENAI_ORG_ID

OpenAI organization ID.

projectstringOptionalEnv: OPENAI_PROJECT_ID

OpenAI project ID.

temperaturefloatOptionalDefault: 1.0

A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see chat completions.

Azure OpenAI TTS

LiveKit's Azure integration provides an OpenAI compatible text-to-speech (TTS) interface. This can be used for speech generation for an agent created with the VoicePipelineAgent class.

Azure's OpenAI compatible API needs to be configured to connect to OpenAI. You can set the environment variables listed in the usage section or pass in these values when you create the TTS instance.

TTS.create_azure_client usage

Use the TTS.create_azure_client method to create an instance of an Azure OpenAI TTS:

from livekit.plugins.openai import tts
azure_tts = tts.TTS.create_azure_client(
model="tts-1",
voice="alloy",
)

TTS.create_azure_client parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

modelstringOptionalDefault: tts-1

ID of the model to use for TTS. To learn more, see supported models.

voicestringOptional

OpenAI text-to-speech voice. To learn more, see Voice options.

azure_endpointstringOptionalEnv: AZURE_OPENAI_ENDPOINT

Azure OpenAI endpoint in the following format: https://{your-resource-name}.openai.azure.com.

azure_deploymentstringOptional

Name of your model deployment.

api_versionstringOptionalEnv: OPENAI_API_VERSION

OpenAI REST API version used for the request.

api_keystringOptionalEnv: AZURE_OPENAI_API_KEY

Azure OpenAI API key.

azure_ad_tokenstringOptionalEnv: AZURE_OPENAI_AD_TOKEN

Azure Active Directory token.

organizationstringOptionalEnv: OPENAI_ORG_ID

OpenAI organization ID.

projectstringOptionalEnv: OPENAI_PROJECT_ID

OpenAI project ID.

Azure Speech STT

LiveKit's Azure plugin provides support for Speech service STT. To connect to Azure's Speech service, set the environment variables listed in the usage section, or pass these values in when you create an STT instance.

Note

The Azure plugin is currently only available for the Python Agents framework.

Azure Speech STT usage

AZURE_SPEECH_KEY=<azure-speech-key>
AZURE_SPEECH_REGION=<azure-speech-region>
AZURE_SPEECH_HOST=<azure-speech-host>
LIVEKIT_API_KEY=<livekit-api-key>
LIVEKIT_API_SECRET=<livekit-api-secret>
LIVEKIT_URL=<livekit-url>

Azure Speech STT parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

Note

To create an instance of azure.STT, one of the following options must be met:

  • speech_host must be set, or
  • speech_key and speech_region must both be set, or
  • speech_auth_token and speech_region must both be set
speech_keystringOptionalEnv: AZURE_SPEECH_KEY

Azure Speech speech-to-text key. To learn more, see Azure Speech prerequisites.

speech_regionstringOptionalEnv: AZURE_SPEECH_REGION

Azure Speech speech-to-text region. To learn more, see Azure Speech prerequisites.

speech_hoststringOptionalEnv: AZURE_SPEECH_HOST

Azure Speech endpoint.

speech_auth_tokenstringOptional

Azure Speech authentication token.

languageslist[string]Optional

List of potential source languages. To learn more, see Standard locale names.

Azure Speech TTS

LiveKit's Azure plugin provides support for Speech service TTS. To connect to Azure's Speech service, set the environment variables listed in the usage section, or pass these values in when you create a TTS instance.

Note

The Azure plugin is currently only available for the Python Agents framework.

from livekit.plugins import azure
azure_stt = azure.TTS(
speech_key="<speech_service_key>",
speech_region="<speech_service_region>",
)
Azure Speech TTS parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.

Note

To create an instance of azure.TTS, one of the following options must be met:

  • speech_host must be set, or
  • speech_key and speech_region must both be set, or
  • speech_auth_token and speech_region must both be set
voicestringOptional

Voice for text to speech. To learn more, see Select synthesis language and voice.

languagestringOptional

Language of the input text. To learn more, see Select synthesis language and voice.

prosodyProsodyConfigOptional

Specify changes to pitch, rate, and volume for the speech output. To learn more, see Adjust prosody.

speech_keystringOptionalEnv: AZURE_SPEECH_KEY

Azure Speech speech-to-text key. To learn more, see Azure Speech prerequisites.

speech_regionstringOptionalEnv: AZURE_SPEECH_REGION

Azure Speech speech-to-text region. To learn more, see Azure Speech prerequisites.

speech_hoststringOptionalEnv: AZURE_SPEECH_HOST

Azure Speech endpoint.

speech_auth_tokenstringOptional

Azure Speech authentication token.

Azure Realtime API

LiveKit's OpenAI plugin provides support for Azure AI Services Realtime API when you create an agent with the MultimodalAgent class. To use the Realtime API, use the RealtimeModel.with_azure method.

RealtimeModel.with_azure usage

Create an instance of MultimodalAgent using Azure's Realtime API:

agent = multimodal.MultimodalAgent(
model=openai.realtime.RealtimeModel.with_azure(
azure_deployment="<model-deployment>",
azure_endpoint="wss://<endpoint>.openai.azure.com/", # or AZURE_OPENAI_ENDPOINT
api_key="<api-key>", # or AZURE_OPENAI_API_KEY
api_version="2024-10-01-preview", # or OPENAI_API_VERSION
voice="alloy",
temperature=0.8,
instructions="You are a helpful assistant",
turn_detection=openai.realtime.ServerVadOptions(
threshold=0.6, prefix_padding_ms=200, silence_duration_ms=500
),
),
fnc_ctx=fnc_ctx,
)

RealtimeModel.with_azure parameters

This section describes some of the parameters for the RealtimeModel.with_azure method. For a full list of parameters, see the plugin documentation.

azure_deploymentstringOptional

Name of your model deployment.

azure_endpointstringOptionalEnv: AZURE_OPENAI_ENDPOINT

Azure OpenAI endpoint in the following format: https://{your-resource-name}.openai.azure.com.

api_versionstringOptionalEnv: OPENAI_API_VERSION

OpenAI REST API version used for the request.

api_keystringOptionalEnv: AZURE_OPENAI_API_KEY

Azure OpenAI API key.

entra_tokenstringOptional

Microsoft Entra authentication token. Required if not using API key authentication. To learn more see Azure's Authentication documentation.

voicestringOptionalDefault: alloy

Voice to use for speech. To learn more, see Voice options.

temperaturefloatOptionalDefault: 1.0

A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see chat completions.

instructionsstringOptional

Initial system instructions.

modalitieslist[api_proto.Modality]OptionalDefault: ["text", "audio"]

Modalities to use, such as ["text", "audio"].

turn_detectionServerVadOptionsOptional

Server-side VAD settings. To learn more, see Turn detection and ServerVadOptions class.