Skip to main content

Sarvam TTS plugin guide

How to use the Sarvam TTS plugin for LiveKit Agents.

Available in
Python
|
Node.js

Overview

This plugin allows you to use Sarvam as a TTS provider for your voice agents.

Installation

Install the plugin:

uv add "livekit-agents[sarvam]~=1.4"
pnpm add @livekit/agents-plugin-sarvam@1.x

Authentication

The Sarvam plugin requires a Sarvam API key.

Set SARVAM_API_KEY in your .env file.

Usage

Use Sarvam TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.

from livekit.plugins import sarvam
session = AgentSession(
tts=sarvam.TTS(
target_language_code="hi-IN",
model="bulbul:v3",
speaker="shubh",
pace=1.0,
temperature=0.6,
output_audio_bitrate="128k",
min_buffer_size=50,
max_chunk_length=150,
)
# ... llm, stt, etc.
)
import * as sarvam from '@livekit/agents-plugin-sarvam';
const session = new voice.AgentSession({
tts: new sarvam.TTS({
targetLanguageCode: "hi-IN",
model: "bulbul:v3",
speaker: "shubh",
pace: 1.0,
temperature: 0.6,
}),
// ... llm, stt, etc.
});

Parameters

This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.

target_language_codestringRequired

BCP-47 language code for supported Indian languages. For example: hi-IN for Hindi, en-IN for Indian English. See documentation for a complete list of supported languages.

In Node.js this parameter is called targetLanguageCode.

modelstringOptionalDefault: bulbul:v2

The Sarvam TTS model to use. Valid values are:

  • bulbul:v2
  • bulbul:v3-beta
  • bulbul:v3
speakerstringOptionalDefault: varies by model

Voice to use for synthesis. Default depends on the selected model:

  • anushka for bulbul:v2
  • shubh for bulbul:v3-beta and bulbul:v3

Speakers are validated for model compatibility.

pitchfloatOptionalDefault: 0.0

Voice pitch adjustment. Valid range: -20.0 to 20.0. Included in synthesis payload for bulbul:v2.

pacefloatOptionalDefault: 1.0

Speech rate multiplier. Valid range: 0.5 to 2.0.

temperaturefloatOptionalDefault: 0.6

Controls output randomness. Valid range: 0.01 to 1.0. Only valid if model is bulbul:v3 or bulbul:v3-beta. This value is ignored for bulbul:v2.

loudnessfloatOptionalDefault: 1.0

Volume multiplier. Valid range: 0.5 to 2.0. Included in synthesis payload for bulbul:v2.

enable_preprocessingbooleanOptionalDefault: false

Controls whether normalization of English words and numeric entities (for example, numbers and dates) is performed. Set to true for better handling of mixed-language text.

Only valid if model is bulbul:v2. This value is ignored for other models.

In Node.js this parameter is called enablePreprocessing.

output_audio_bitratestringOptionalDefault: 128k

Output audio bitrate. Allowed values: 32k, 64k, 96k, 128k, 192k.

Only available in the Python plugin.

min_buffer_sizeintegerOptionalDefault: 50

Minimum character length that triggers buffer flushing for TTS model processing. Valid range: 30 to 200.

Only available in the Python plugin.

max_chunk_lengthintegerOptionalDefault: 150

Maximum length for sentence splitting (adjust based on content length). Valid range: 50 to 500.

Only available in the Python plugin.

speech_sample_rateintOptionalDefault: 22050

Output sample rate in Hz. Supported values: 8000, 16000, 22050, 24000.

In Node.js this parameter is called sampleRate.

Additional resources

The following resources provide more information about using Sarvam with LiveKit Agents.