Overview
This plugin allows you to use Baseten as a TTS provider for your voice agents.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[baseten]~=1.5"
Authentication
The Baseten plugin requires a Baseten API key .
Set the following in your .env file:
BASETEN_API_KEY=<your-baseten-api-key>
Model deployment
You must deploy a TTS model such as Orpheus to Baseten to use it with LiveKit Agents. Your deployment includes a private model endpoint URL to provide to the LiveKit Agents integration.
Baseten model endpoints come in two forms, HTTP and websocket.
The plugin selects its mode from the URL scheme:
https://endpoints use HTTP synthesis. The agent sends the full text in a single request and receives the audio in the response.wss://endpoints use websocket streaming. The agent streams words to the model as the LLM generates them, and the model streams audio back as it produces it. This significantly reduces latency for voice agents. Streaming requires a websocket-capable Baseten TTS deployment .
When model_endpoint starts with wss://, the plugin reports capabilities.streaming=True and the agent uses streaming synthesis. Otherwise the plugin falls back to HTTP synthesis. No further configuration is needed to switch between the two.
Usage
Use Baseten TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.
from livekit.plugins import basetensession = AgentSession(# Pass a wss:// URL for websocket streaming, or an https:// URL for HTTP synthesis.tts=baseten.TTS(model_endpoint="<your-model-endpoint>",voice="tara",)# ... llm, stt, etc.)
Parameters
This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.
model_endpointstringEnv: BASETEN_MODEL_ENDPOINTThe endpoint URL for your deployed model, found in your Baseten dashboard. Pass a wss:// URL to enable realtime websocket streaming, or an https:// URL for HTTP synthesis.
voicestringDefault: taraThe voice to use for speech synthesis.
languageLanguageCodeDefault: enLanguage code for the output audio.
temperaturefloatDefault: 0.6Controls the randomness of the generated speech. Higher values make the output more random.
max_tokensintDefault: 2000Maximum number of tokens to generate per request. (Websocket only.)
buffer_sizeintDefault: 10Number of words per chunk streamed to the model. Smaller values reduce time-to-first-audio at the cost of slightly more overhead. (Websocket only.)
Additional resources
The following resources provide more information about using Baseten with LiveKit Agents.
Python package
The livekit-plugins-baseten package on PyPI.
Plugin reference
Reference for the Baseten TTS plugin.
GitHub repo
View the source or contribute to the LiveKit Baseten TTS plugin.
Baseten docs
Baseten's full docs site.
Voice AI quickstart
Get started with LiveKit Agents and Baseten.
Baseten STT
Guide to the Baseten STT plugin with LiveKit Agents.