OpenAI STT plugin guide | LiveKit Documentation

Available inPython

Node.js

Overview

This plugin allows you to use OpenAI as an STT provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[openai]~=1.5"

pnpm add @livekit/agents-plugin-openai@1.x

Authentication

The OpenAI plugin requires an OpenAI API key .

Set OPENAI_API_KEY in your .env file.

Usage

Use OpenAI STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import openai

session = AgentSession(
  stt = openai.STT(
    model="gpt-4o-mini-transcribe",
  ),
  # ... llm, tts, etc.
)

import * as openai from '@livekit/agents-plugin-openai';
import * as silero from '@livekit/agents-plugin-silero';

const vad = await silero.VAD.load();

const session = new voice.AgentSession({
    stt: new openai.STT({
        model: 'gpt-realtime-whisper',
        vad,
    }),
    // ... llm, tts, etc.
});

VAD requirement

The gpt-realtime-whisper model doesn't support server-side turn detection. You must pass a vad instance so the plugin can commit the audio buffer at end-of-speech.

Default model change in Node.js

In @livekit/agents-plugin-openai@1.4.1, the default model for openai.STT changed from whisper-1 to gpt-realtime-whisper, which streams transcription over a WebSocket. To restore the previous behavior, set useRealtime: false:

const stt = new openai.STT({ useRealtime: false });

Parameters

This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.

modelWhisperModels | stringDefault: gpt-4o-mini-transcribe | gpt-realtime-whisper

Model to use for transcription. See OpenAI's documentation for a list of supported models .

Default model varies by SDK:

Python: gpt-4o-mini-transcribe
Node.js: gpt-realtime-whisper

languageLanguageCodeDefault: en

Language code for the input audio. See OpenAI's documentation for a list of supported languages .

useRealtimebooleanDefault: true

Only Available inNode.js

When true, streams transcription over the OpenAI Realtime WebSocket. Set to false to use the standard REST transcription API with whisper-1.

vadVAD

Only Available inNode.js

A VAD instance for client-side end-of-speech detection. Required when using gpt-realtime-whisper, because this model doesn't support server-side turn detection.

Additional resources

The following resources provide more information about using OpenAI with LiveKit Agents.

Python package

Reference GitHub PyPI

Node.js plugin

Reference GitHub NPM

OpenAI docs

OpenAI STT docs.

Voice AI quickstart

Get started with LiveKit Agents and OpenAI STT.

OpenAI ecosystem guide

Overview of the entire OpenAI and LiveKit Agents integration.