Skip to main content

Azure OpenAI Realtime API and LiveKit

How to use the Azure OpenAI Realtime API with LiveKit Agents.

Available in
Python
|
Node.js

Overview

Azure OpenAI provides an implementation of OpenAI's Realtime API that enables low-latency, multimodal interactions with realtime audio and text processing through Azure's managed service. Use LiveKit's Azure OpenAI plugin to create an agent that uses the Realtime API.

Note

Using the OpenAI platform instead of Azure? See our OpenAI Realtime API guide.

Quick reference

This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.

Installation

Install the OpenAI plugin from PyPI:

pip install "livekit-agents[openai]~=1.2"

Authentication

The Azure OpenAI plugin requires an Azure OpenAI API key and your Azure OpenAI endpoint.

Set the following environment variables in your .env file:

AZURE_OPENAI_API_KEY=<your-azure-openai-api-key>
AZURE_OPENAI_ENDPOINT=<your-azure-openai-endpoint>
OPENAI_API_VERSION=2024-10-01-preview

Usage

Use the Azure OpenAI Realtime API within an AgentSession:

from livekit.plugins import openai
session = AgentSession(
llm=openai.realtime.RealtimeModel.with_azure(
azure_deployment="<model-deployment>",
azure_endpoint="wss://<endpoint>.openai.azure.com/",
api_key="<api-key>",
api_version="2024-10-01-preview",
),
)

For a more comprehensive agent example, see the Voice AI quickstart.

Parameters

This section describes the Azure-specific parameters. For a complete list of all available parameters, see the plugin documentation.

azure_deploymentstringRequired

Name of your model deployment.

entra_tokenstringOptional

Microsoft Entra ID authentication token. Required if not using API key authentication. To learn more see Azure's Authentication documentation.

voicestringOptionalDefault: alloy

Voice to use for speech. To learn more, see Voice options.

temperaturefloatOptionalDefault: 1.0

A measure of randomness of completions. A lower temperature is more deterministic. To learn more, see chat completions.

instructionsstringOptional

Initial system instructions.

modalitieslist[api_proto.Modality]OptionalDefault: ["text", "audio"]

Modalities to use, such as ["text", "audio"]. Set to ["text"] to use the model in text-only mode with a separate TTS plugin.

turn_detectionTurnDetection | NoneOptional

Configuration for turn detection, see the section on Turn detection for more information.

Turn detection

The Azure OpenAI Realtime API includes voice activity detection (VAD) to automatically detect when a user has started or stopped speaking. This feature is enabled by default

There is one supported mode for VAD:

  • Server VAD (default) - Uses periods of silence to automatically chunk the audio

Server VAD

Server VAD is the default mode and can be configured with the following properties:

from livekit.plugins.openai import realtime
from openai.types.beta.realtime.session import TurnDetection
session = AgentSession(
llm=realtime.RealtimeModel(
turn_detection=TurnDetection(
type="server_vad",
threshold=0.5,
prefix_padding_ms=300,
silence_duration_ms=500,
create_response=True,
interrupt_response=True,
)
),
)
  • threshold: Higher values require louder audio to activate, better for noisy environments.
  • prefix_padding_ms: Amount of audio to include before detected speech.
  • silence_duration_ms: Duration of silence to detect speech stop (shorter = faster turn detection).

Usage with separate TTS

To use the Azure OpenAI Realtime API with a different TTS provider, configure it with a text-only response modality and include a TTS plugin in your AgentSession configuration. This configuration allows you to gain the benefits of direct speech understanding while maintaining complete control over the speech output.

session = AgentSession(
llm=openai.realtime.RealtimeModel.with_azure(
# ... endpoint and auth params ...,
modalities=["text"]
),
tts=cartesia.TTS() # Or other TTS plugin of your choice
)

Loading conversation history

If you load conversation history into the model, it might respond with text output even if configured for audio response. To work around this issue, use the model with a separate TTS plugin and text-only response modality. You can use the Azure OpenAI TTS plugin to continue using the same voices supported by the Realtime API.

For additional workaround options, see the OpenAI thread on this topic.

Additional resources

The following resources provide more information about using Azure OpenAI with LiveKit Agents.