Overview
LiveKit's Azure integration provides support for multiple Azure AI Services. These include Azure OpenAI, Speech service for STT and TTS, and Realtime API. Azure OpenAI allows you to run OpenAI models using the security capabilities of Microsoft Azure. The Speech service's STT and TTS allow you to transcribe speech to text with high accuracy and produce natural-sounding text-to-speech voices. Azure's Realtime API processes user input and responds immediately, allowing you to create agents that sound naturally responsive.
LiveKit provides multiple integration paths for using Azure AI Services for building agents:
- OpenAI plugin support for Azure OpenAI LLM and TTS.
- Azure plugin for Speech service STT and TTS.
- OpenAI plugin support for Azure AI Services Realtime API.
You can use Azure STT, TTS, and LLM to create agents using the VoicePipelineAgent class. To use the Realtime API, you can create an agent using the MultimodalAgent class.
Quick reference
The following sections provide a quick reference for integrating Azure AI Services with LiveKit. For the complete reference, see the links provided in each section.
Azure OpenAI LLM
LiveKit's Azure integration provides an OpenAI compatible LLM interface. This can be used as the LLM for an agent created using the VoicePipelineAgent
class.
Azure's OpenAI compatible API needs to be configured to connect to OpenAI. You can set the environment variables listed in the usage section or pass in these values when you create the LLM instance.
LLM.with_azure usage
Use the with_azure
method to create an instance of an Azure OpenAI LLM:
from livekit.plugins.openai import LLMazure_llm = LLM.with_azure(model="gpt-4o",temperature=0.8,)
LLM.with_azure parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
ID of the model to use for inference. To learn more, see supported models.
Azure OpenAI endpoint in the following format: https://{your-resource-name}.openai.azure.com
.
Name of your model deployment.
OpenAI REST API version used for the request.
Azure OpenAI API key.
Azure Active Directory token.
Function that returns an Azure Active Directory token.
OpenAI organization ID.
OpenAI project ID.
A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see chat completions.
Azure OpenAI TTS
LiveKit's Azure integration provides an OpenAI compatible text-to-speech (TTS) interface. This can be used for speech generation for an agent created with the VoicePipelineAgent
class.
Azure's OpenAI compatible API needs to be configured to connect to OpenAI. You can set the environment variables listed in the usage section or pass in these values when you create the TTS instance.
TTS.create_azure_client usage
Use the TTS.create_azure_client
method to create an instance of an Azure OpenAI TTS:
from livekit.plugins.openai import ttsazure_tts = tts.TTS.create_azure_client(model="tts-1",voice="alloy",)
TTS.create_azure_client parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
ID of the model to use for TTS. To learn more, see supported models.
OpenAI text-to-speech voice. To learn more, see Voice options.
Azure OpenAI endpoint in the following format: https://{your-resource-name}.openai.azure.com
.
Name of your model deployment.
OpenAI REST API version used for the request.
Azure OpenAI API key.
Azure Active Directory token.
OpenAI organization ID.
OpenAI project ID.
Azure Speech STT
LiveKit's Azure plugin provides support for Speech service STT. To connect to Azure's Speech service, set the environment variables listed in the usage section, or pass these values in when you create an STT instance.
The Azure plugin is currently only available for the Python Agents framework.
Azure Speech STT usage
AZURE_SPEECH_KEY=<azure-speech-key>AZURE_SPEECH_REGION=<azure-speech-region>AZURE_SPEECH_HOST=<azure-speech-host>LIVEKIT_API_KEY=<livekit-api-key>LIVEKIT_API_SECRET=<livekit-api-secret>LIVEKIT_URL=<livekit-url>
Azure Speech STT parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
To create an instance of azure.STT
, one of the following options must be met:
speech_host
must be set, orspeech_key
andspeech_region
must both be set, orspeech_auth_token
andspeech_region
must both be set
Azure Speech speech-to-text key. To learn more, see Azure Speech prerequisites.
Azure Speech speech-to-text region. To learn more, see Azure Speech prerequisites.
Azure Speech endpoint.
Azure Speech authentication token.
List of potential source languages. To learn more, see Standard locale names.
Azure Speech TTS
LiveKit's Azure plugin provides support for Speech service TTS. To connect to Azure's Speech service, set the environment variables listed in the usage section, or pass these values in when you create a TTS instance.
The Azure plugin is currently only available for the Python Agents framework.
from livekit.plugins import azureazure_stt = azure.TTS(speech_key="<speech_service_key>",speech_region="<speech_service_region>",)
Azure Speech TTS parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
To create an instance of azure.TTS
, one of the following options must be met:
speech_host
must be set, orspeech_key
andspeech_region
must both be set, orspeech_auth_token
andspeech_region
must both be set
Voice for text to speech. To learn more, see Select synthesis language and voice.
Language of the input text. To learn more, see Select synthesis language and voice.
Specify changes to pitch, rate, and volume for the speech output. To learn more, see Adjust prosody.
Azure Speech speech-to-text key. To learn more, see Azure Speech prerequisites.
Azure Speech speech-to-text region. To learn more, see Azure Speech prerequisites.
Azure Speech endpoint.
Azure Speech authentication token.
Azure Realtime API
LiveKit's OpenAI plugin provides support for Azure AI Services Realtime API when you create an agent with the MultimodalAgent
class. To use the Realtime API, use the RealtimeModel.with_azure
method.
RealtimeModel.with_azure usage
Create an instance of MultimodalAgent
using Azure's Realtime API:
agent = multimodal.MultimodalAgent(model=openai.realtime.RealtimeModel.with_azure(azure_deployment="<model-deployment>",azure_endpoint="wss://<endpoint>.openai.azure.com/", # or AZURE_OPENAI_ENDPOINTapi_key="<api-key>", # or AZURE_OPENAI_API_KEYapi_version="2024-10-01-preview", # or OPENAI_API_VERSIONvoice="alloy",temperature=0.8,instructions="You are a helpful assistant",turn_detection=openai.realtime.ServerVadOptions(threshold=0.6, prefix_padding_ms=200, silence_duration_ms=500),),fnc_ctx=fnc_ctx,)
RealtimeModel.with_azure parameters
This section describes some of the parameters for the RealtimeModel.with_azure
method. For a full list of parameters, see the plugin documentation.
Name of your model deployment.
Azure OpenAI endpoint in the following format: https://{your-resource-name}.openai.azure.com
.
OpenAI REST API version used for the request.
Azure OpenAI API key.
Microsoft Entra authentication token. Required if not using API key authentication. To learn more see Azure's Authentication documentation.
Voice to use for speech. To learn more, see Voice options.
A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see chat completions.
Initial system instructions.
Modalities to use, such as ["text", "audio"].
Server-side VAD settings. To learn more, see Turn detection and ServerVadOptions class.