Skip to main content

OpenAI STT plugin guide

How to use the OpenAI STT plugin for LiveKit Agents.

Available in
Python
|
Node.js

Overview

This plugin allows you to use OpenAI  as an STT provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[openai]~=1.5"
pnpm add @livekit/agents-plugin-openai@1.x

Authentication

The OpenAI plugin requires an OpenAI API key .

Set OPENAI_API_KEY in your .env file.

Usage

Use OpenAI STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import openai
session = AgentSession(
stt = openai.STT(
model="gpt-4o-mini-transcribe",
),
# ... llm, tts, etc.
)
import * as openai from '@livekit/agents-plugin-openai';
import * as silero from '@livekit/agents-plugin-silero';
const vad = await silero.VAD.load();
const session = new voice.AgentSession({
stt: new openai.STT({
model: 'gpt-realtime-whisper',
vad,
}),
// ... llm, tts, etc.
});
VAD requirement

The gpt-realtime-whisper model doesn't support server-side turn detection. You must pass a vad instance so the plugin can commit the audio buffer at end-of-speech.

Default model change in Node.js

In @livekit/agents-plugin-openai@1.4.1, the default model for openai.STT changed from whisper-1 to gpt-realtime-whisper, which streams transcription over a WebSocket. To restore the previous behavior, set useRealtime: false:

const stt = new openai.STT({ useRealtime: false });

Parameters

This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.

modelWhisperModels | stringDefault: gpt-4o-mini-transcribe | gpt-realtime-whisper

Model to use for transcription. See OpenAI's documentation for a list of supported models .

Default model varies by SDK:

  • Python: gpt-4o-mini-transcribe
  • Node.js: gpt-realtime-whisper
languageLanguageCodeDefault: en

Language code for the input audio. See OpenAI's documentation for a list of supported languages .

useRealtimebooleanDefault: true
Only Available in
Node.js

When true, streams transcription over the OpenAI Realtime WebSocket. Set to false to use the standard REST transcription API with whisper-1.

vadVAD
Only Available in
Node.js

A VAD instance for client-side end-of-speech detection. Required when using gpt-realtime-whisper, because this model doesn't support server-side turn detection.

Additional resources

The following resources provide more information about using OpenAI with LiveKit Agents.