OpenAI Playground
Experiment with OpenAI's Realtime API in the playground with personalities like the Snarky Teenager or Opera Singer.
Overview
OpenAI's Realtime API enables low-latency, multimodal interactions with realtime audio and text processing. Use LiveKit's OpenAI plugin to create an agent that uses the Realtime API.
Using Azure OpenAI? See our Azure OpenAI Realtime API guide.
Quick reference
This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.
Installation
Install the OpenAI plugin:
pip install "livekit-agents[openai]~=1.2"
Authentication
The OpenAI plugin requires an OpenAI API key.
Set OPENAI_API_KEY
in your .env
file.
Usage
Use the OpenAI Realtime API within an AgentSession
. For example, you can use it in the Voice AI quickstart.
from livekit.plugins import openaisession = AgentSession(llm=openai.realtime.RealtimeModel(),)
Parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
ID of the Realtime model to use. For a list of available models, see the Models.
Voice to use for speech generation. For a list of available voices, see Voice options.
Valid values are between 0.6
and 1.2
. To learn more, see temperature.
Configuration for turn detection, see the section on Turn detection for more information.
List of response modalities to use for the session. Set to ['text']
to use the model in text-only mode with a separate TTS plugin.
Turn detection
OpenAI's Realtime API includes voice activity detection (VAD) to automatically detect when a user has started or stopped speaking. This feature is enabled by default.
There are two modes for VAD:
- Server VAD (default): Uses periods of silence to automatically chunk the audio.
- Semantic VAD: Uses a semantic classifier to detect when the user has finished speaking based on their words.
Server VAD
Server VAD is the default mode and can be configured with the following properties:
from livekit.plugins.openai import realtimefrom openai.types.beta.realtime.session import TurnDetectionsession = AgentSession(llm=realtime.RealtimeModel(turn_detection=TurnDetection(type="server_vad",threshold=0.5,prefix_padding_ms=300,silence_duration_ms=500,create_response=True,interrupt_response=True,)),)
threshold
: Higher values require louder audio to activate, better for noisy environments.prefix_padding_ms
: Amount of audio to include before detected speech.silence_duration_ms
: Duration of silence to detect speech stop (shorter = faster turn detection).
Semantic VAD
Semantic VAD uses a classifier to determine when the user is done speaking based on their words. This mode is less likely to interrupt users mid-sentence or chunk transcripts prematurely.
from livekit.plugins.openai import realtimefrom openai.types.beta.realtime.session import TurnDetectionsession = AgentSession(llm=realtime.RealtimeModel(turn_detection=TurnDetection(type="semantic_vad",eagerness="auto",create_response=True,interrupt_response=True,)),)
The eagerness
property controls how quickly the model responds:
auto
(default) - Equivalent tomedium
.low
- Lets users take their time speaking.high
- Chunks audio as soon as possible.medium
- Balanced approach.
For more information about turn detection in general, see the Turn detection guide.
Usage with separate TTS
To use the OpenAI Realtime API with a different TTS provider, configure it with a text-only response modality and include a TTS plugin in your AgentSession
configuration. This configuration allows you to gain the benefits of realtime speech comprehension while maintaining complete control over the speech output.
session = AgentSession(llm=openai.realtime.RealtimeModel(modalities=["text"]),tts=cartesia.TTS() # Or other TTS plugin of your choice)
Loading conversation history
If you load conversation history into the model, it might respond with text output even if configured for audio response. To work around this issue, use the model with a separate TTS plugin and text-only response modality. You can use the Azure OpenAI TTS plugin to continue using the same voices supported by the Realtime API.
For additional workaround options, see the OpenAI thread on this topic.
Additional resources
The following resources provide more information about using OpenAI with LiveKit Agents.