Skip to main content

Baseten STT integration guide

How to use the Baseten STT plugin for LiveKit Agents.

Available in
Python

Overview

Baseten is a hosted inference platform that allows you to deploy and serve any machine learning model. With LiveKit's Baseten integration and the Agents framework, you can build AI agents that provide high-accuracy transcriptions using models like Whisper.

Quick reference

This section provides a quick reference for the Baseten STT plugin. For more information, see Additional resources.

Installation

Install the plugin from PyPI:

pip install "livekit-agents[baseten]~=1.2"

Authentication

The Baseten plugin requires a Baseten API key.

Set the following in your .env file:

BASETEN_API_KEY=<your-baseten-api-key>

Model deployment

You must deploy a websocket-based STT model to Baseten to use it with LiveKit Agents. The standard Whisper deployments available in the Baseten library are not suitable for realtime use. Contact Baseten support for help deploying a websocket-compatible Whisper model.

Your model endpoint may show as an HTTP URL such as https://model-<id>.api.baseten.co/environments/production/predict. The domain is correct but you must change the protocol to wss and the path to /v1/websocket to use it as the model_endpoint parameter for the Baseten STT plugin.

The correct websocket URL format is:

wss://<your-model-id>.api.baseten.co/v1/websocket

Usage

Use Baseten STT within an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import baseten
session = AgentSession(
stt=baseten.STT(
model_endpoint="wss://<your-model-id>.api.baseten.co/v1/websocket",
)
# ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

model_endpointstringOptionalEnv: BASETEN_MODEL_ENDPOINT

The endpoint URL for your deployed model. You can find this in your Baseten dashboard. Note that this must be a websocket URL (starts with wss://). See Model deployment for more details.

languagestringOptionalDefault: en

Language of input audio in ISO-639-1 format.

vad_thresholdfloatOptionalDefault: 0.5

Threshold for voice activity detection.

vad_min_silence_duration_msintOptionalDefault: 300

Minimum duration of silence in milliseconds to consider speech ended.

vad_speech_pad_msintOptionalDefault: 30

Duration in milliseconds to pad speech segments.

Additional resources

The following resources provide more information about using Baseten with LiveKit Agents.