Overview
Azure OpenAI provides an implementation of OpenAI's Realtime API that enables low-latency, multimodal interactions with realtime audio and text processing through Azure's managed service. Use LiveKit's Azure OpenAI plugin to create an agent that uses the Realtime API.
Using the OpenAI platform instead of Azure? See our OpenAI Realtime API guide.
Quick reference
This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.
Installation
Install the OpenAI plugin from PyPI:
pip install "livekit-agents[openai]~=1.0"
Authentication
The Azure OpenAI plugin requires an Azure OpenAI API key and your Azure OpenAI endpoint.
Set the following environment variables in your .env
file:
AZURE_OPENAI_API_KEY=<your-azure-openai-api-key>AZURE_OPENAI_ENDPOINT=<your-azure-openai-endpoint>OPENAI_API_VERSION=2024-10-01-preview
Usage
Use the Azure OpenAI Realtime API within an AgentSession
:
from livekit.plugins import openaisession = AgentSession(llm=openai.realtime.RealtimeModel.with_azure(azure_deployment="<model-deployment>",azure_endpoint="wss://<endpoint>.openai.azure.com/",api_key="<api-key>",api_version="2024-10-01-preview",),)
For a more comprehensive agent example, see the Voice AI quickstart.
Parameters
This section describes the Azure-specific parameters. For a complete list of all available parameters, see the plugin documentation.
Name of your model deployment.
Microsoft Entra ID authentication token. Required if not using API key authentication. To learn more see Azure's Authentication documentation.
Voice to use for speech. To learn more, see Voice options.
A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see chat completions.
Initial system instructions.
Modalities to use, such as ["text", "audio"].
Configuration for turn detection, see the section on Turn detection for more information.
Turn detection
The Azure OpenAI Realtime API includes voice activity detection (VAD) to automatically detect when a user has started or stopped speaking. This feature is enabled by default
There is one supported mode for VAD:
- Server VAD (default) - Uses periods of silence to automatically chunk the audio
Server VAD
Server VAD is the default mode and can be configured with the following properties:
from livekit.plugins.openai import realtimefrom openai.types.beta.realtime.session import TurnDetectionsession = AgentSession(llm=realtime.RealtimeModel(turn_detection=TurnDetection(type="server_vad",threshold=0.5,prefix_padding_ms=300,silence_duration_ms=500,create_response=True,interrupt_response=True,)),)
threshold
: Higher values require louder audio to activate, better for noisy environments.prefix_padding_ms
: Amount of audio to include before detected speech.silence_duration_ms
: Duration of silence to detect speech stop (shorter = faster turn detection).
Additional resources
The following resources provide more information about using Azure OpenAI with LiveKit Agents.
Python package
The livekit-plugins-openai
package on PyPI.
Plugin reference
Reference for the Azure OpenAI Realtime plugin.
GitHub repo
View the source or contribute to the LiveKit OpenAI Realtime plugin.
Azure OpenAI docs
Azure OpenAI service documentation.
Voice AI quickstart
Get started with LiveKit Agents and Azure OpenAI.
Azure ecosystem overview
Overview of the entire Azure AI ecosystem and LiveKit Agents integration.