Overview
This plugin allows you to use OpenAI as an STT provider for your voice agents.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[openai]~=1.5"
pnpm add @livekit/agents-plugin-openai@1.x
Authentication
The OpenAI plugin requires an OpenAI API key .
Set OPENAI_API_KEY in your .env file.
Usage
Use OpenAI STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
from livekit.plugins import openaisession = AgentSession(stt = openai.STT(model="gpt-4o-mini-transcribe",),# ... llm, tts, etc.)
import * as openai from '@livekit/agents-plugin-openai';import * as silero from '@livekit/agents-plugin-silero';const vad = await silero.VAD.load();const session = new voice.AgentSession({stt: new openai.STT({model: 'gpt-realtime-whisper',vad,}),// ... llm, tts, etc.});
The gpt-realtime-whisper model doesn't support server-side turn detection. You must pass a vad instance so the plugin can commit the audio buffer at end-of-speech.
In @livekit/agents-plugin-openai@1.4.1, the default model for openai.STT changed from whisper-1 to gpt-realtime-whisper, which streams transcription over a WebSocket. To restore the previous behavior, set useRealtime: false:
const stt = new openai.STT({ useRealtime: false });
Parameters
This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.
modelWhisperModels | stringDefault: gpt-4o-mini-transcribe | gpt-realtime-whisperModel to use for transcription. See OpenAI's documentation for a list of supported models .
Default model varies by SDK:
- Python:
gpt-4o-mini-transcribe - Node.js:
gpt-realtime-whisper
languageLanguageCodeDefault: enLanguage code for the input audio. See OpenAI's documentation for a list of supported languages .
useRealtimebooleanDefault: trueWhen true, streams transcription over the OpenAI Realtime WebSocket. Set to false to use the standard REST transcription API with whisper-1.
vadVADA VAD instance for client-side end-of-speech detection. Required when using gpt-realtime-whisper, because this model doesn't support server-side turn detection.
Additional resources
The following resources provide more information about using OpenAI with LiveKit Agents.