Skip to main content

Baseten TTS plugin guide

How to use the Baseten TTS plugin for LiveKit Agents.

Available in
Python

Overview

This plugin allows you to use Baseten  as a TTS provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[baseten]~=1.5"

Authentication

The Baseten plugin requires a Baseten API key .

Set the following in your .env file:

BASETEN_API_KEY=<your-baseten-api-key>

Model deployment

You must deploy a TTS model such as Orpheus  to Baseten to use it with LiveKit Agents. Your deployment includes a private model endpoint URL to provide to the LiveKit Agents integration.

Baseten model endpoints come in two forms, HTTP and websocket.

The plugin selects its mode from the URL scheme:

  • https:// endpoints use HTTP synthesis. The agent sends the full text in a single request and receives the audio in the response.
  • wss:// endpoints use websocket streaming. The agent streams words to the model as the LLM generates them, and the model streams audio back as it produces it. This significantly reduces latency for voice agents. Streaming requires a websocket-capable Baseten TTS deployment .

When model_endpoint starts with wss://, the plugin reports capabilities.streaming=True and the agent uses streaming synthesis. Otherwise the plugin falls back to HTTP synthesis. No further configuration is needed to switch between the two.

Usage

Use Baseten TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.

from livekit.plugins import baseten
session = AgentSession(
# Pass a wss:// URL for websocket streaming, or an https:// URL for HTTP synthesis.
tts=baseten.TTS(
model_endpoint="<your-model-endpoint>",
voice="tara",
)
# ... llm, stt, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

model_endpointstringEnv: BASETEN_MODEL_ENDPOINT

The endpoint URL for your deployed model, found in your Baseten dashboard. Pass a wss:// URL to enable realtime websocket streaming, or an https:// URL for HTTP synthesis.

voicestringDefault: tara

The voice to use for speech synthesis.

languageLanguageCodeDefault: en

Language code for the output audio.

temperaturefloatDefault: 0.6

Controls the randomness of the generated speech. Higher values make the output more random.

max_tokensintDefault: 2000

Maximum number of tokens to generate per request. (Websocket only.)

buffer_sizeintDefault: 10

Number of words per chunk streamed to the model. Smaller values reduce time-to-first-audio at the cost of slightly more overhead. (Websocket only.)

Additional resources

The following resources provide more information about using Baseten with LiveKit Agents.