Baseten STT plugin guide

Available in

Python

Overview

This plugin allows you to use Baseten as an STT provider for your voice agents.

Quick reference

This section provides a quick reference for the Baseten STT plugin. For more information, see Additional resources.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[baseten]~=1.2"

Authentication

The Baseten plugin requires a Baseten API key.

Set the following in your .env file:

BASETEN_API_KEY=<your-baseten-api-key>

Model deployment

You must deploy a websocket-based STT model to Baseten to use it with LiveKit Agents. The standard Whisper deployments available in the Baseten library are not suitable for realtime use. Contact Baseten support for help deploying a websocket-compatible Whisper model.

Your model endpoint may show as an HTTP URL such as https://model-<id>.api.baseten.co/environments/production/predict. The domain is correct but you must change the protocol to wss and the path to /v1/websocket to use it as the model_endpoint parameter for the Baseten STT plugin.

The correct websocket URL format is:

wss://<your-model-id>.api.baseten.co/v1/websocket

Usage

Use Baseten STT within an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import baseten

session = AgentSession(
   stt=baseten.STT(
      model_endpoint="wss://<your-model-id>.api.baseten.co/v1/websocket",
   )
   # ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

model_endpointstringOptionalEnv: BASETEN_MODEL_ENDPOINT

The endpoint URL for your deployed model. You can find this in your Baseten dashboard. Note that this must be a websocket URL (starts with wss://). See Model deployment for more details.

languagestringOptionalDefault: en

Language of input audio in ISO-639-1 format.

vad_thresholdfloatOptionalDefault: 0.5

Threshold for voice activity detection.

vad_min_silence_duration_msintOptionalDefault: 300

Minimum duration of silence in milliseconds to consider speech ended.

vad_speech_pad_msintOptionalDefault: 30

Duration in milliseconds to pad speech segments.

Additional resources

The following resources provide more information about using Baseten with LiveKit Agents.

Python package

The livekit-plugins-baseten package on PyPI.

Plugin reference

Reference for the Baseten STT plugin.

GitHub repo

View the source or contribute to the LiveKit Baseten STT plugin.

Baseten docs

Baseten's full docs site.

Voice AI quickstart

Get started with LiveKit Agents and Baseten.

Baseten TTS

Guide to the Baseten TTS plugin with LiveKit Agents.