Baseten TTS plugin guide | LiveKit Documentation

Available inPython

Overview

This plugin allows you to use Baseten as a TTS provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[baseten]~=1.5"

Authentication

The Baseten plugin requires a Baseten API key .

Set the following in your .env file:

BASETEN_API_KEY=<your-baseten-api-key>

Model deployment

You must deploy a TTS model such as Orpheus to Baseten to use it with LiveKit Agents. Your deployment includes a private model endpoint URL to provide to the LiveKit Agents integration.

Baseten model endpoints come in two forms, HTTP and websocket.

The plugin selects its mode from the URL scheme:

https:// endpoints use HTTP synthesis. The agent sends the full text in a single request and receives the audio in the response.
wss:// endpoints use websocket streaming. The agent streams words to the model as the LLM generates them, and the model streams audio back as it produces it. This significantly reduces latency for voice agents. Streaming requires a websocket-capable Baseten TTS deployment .

When model_endpoint starts with wss://, the plugin reports capabilities.streaming=True and the agent uses streaming synthesis. Otherwise the plugin falls back to HTTP synthesis. No further configuration is needed to switch between the two.

Usage

Use Baseten TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.

from livekit.plugins import baseten

session = AgentSession(
   # Pass a wss:// URL for websocket streaming, or an https:// URL for HTTP synthesis.
   tts=baseten.TTS(
      model_endpoint="<your-model-endpoint>",
      voice="tara",
   )
   # ... llm, stt, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

model_endpointstringEnv: BASETEN_MODEL_ENDPOINT

The endpoint URL for your deployed model, found in your Baseten dashboard. Pass a wss:// URL to enable realtime websocket streaming, or an https:// URL for HTTP synthesis.

voicestringDefault: tara

The voice to use for speech synthesis.

languageLanguageCodeDefault: en

Language code for the output audio.

temperaturefloatDefault: 0.6

Controls the randomness of the generated speech. Higher values make the output more random.

max_tokensintDefault: 2000

Maximum number of tokens to generate per request. (Websocket only.)

buffer_sizeintDefault: 10

Number of words per chunk streamed to the model. Smaller values reduce time-to-first-audio at the cost of slightly more overhead. (Websocket only.)

Additional resources

The following resources provide more information about using Baseten with LiveKit Agents.

Python package

The livekit-plugins-baseten package on PyPI.

Plugin reference

Reference for the Baseten TTS plugin.

GitHub repo

View the source or contribute to the LiveKit Baseten TTS plugin.

Baseten docs

Baseten's full docs site.

Voice AI quickstart

Get started with LiveKit Agents and Baseten.

Baseten STT

Guide to the Baseten STT plugin with LiveKit Agents.