Overview
This plugin allows you to use Sarvam as a TTS provider for your voice agents.
Installation
Install the plugin:
uv add "livekit-agents[sarvam]~=1.4"
pnpm add @livekit/agents-plugin-sarvam@1.x
Authentication
The Sarvam plugin requires a Sarvam API key.
Set SARVAM_API_KEY in your .env file.
Usage
Use Sarvam TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.
from livekit.plugins import sarvamsession = AgentSession(tts=sarvam.TTS(target_language_code="hi-IN",model="bulbul:v3",speaker="shubh",pace=1.0,temperature=0.6,output_audio_bitrate="128k",min_buffer_size=50,max_chunk_length=150,)# ... llm, stt, etc.)
import * as sarvam from '@livekit/agents-plugin-sarvam';const session = new voice.AgentSession({tts: new sarvam.TTS({targetLanguageCode: "hi-IN",model: "bulbul:v3",speaker: "shubh",pace: 1.0,temperature: 0.6,}),// ... llm, stt, etc.});
Parameters
This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.
stringRequiredBCP-47 language code for supported Indian languages. For example: hi-IN for Hindi, en-IN for Indian English. See documentation for a complete list of supported languages.
In Node.js this parameter is called targetLanguageCode.
stringOptionalDefault: bulbul:v2The Sarvam TTS model to use. Valid values are:
bulbul:v2bulbul:v3-betabulbul:v3
stringOptionalDefault: varies by modelVoice to use for synthesis. Default depends on the selected model:
anushkaforbulbul:v2shubhforbulbul:v3-betaandbulbul:v3
Speakers are validated for model compatibility.
floatOptionalDefault: 0.0Voice pitch adjustment. Valid range: -20.0 to 20.0. Included in synthesis payload for bulbul:v2.
floatOptionalDefault: 1.0Speech rate multiplier. Valid range: 0.5 to 2.0.
floatOptionalDefault: 0.6Controls output randomness. Valid range: 0.01 to 1.0. Only valid if model is bulbul:v3 or bulbul:v3-beta. This value is ignored for bulbul:v2.
floatOptionalDefault: 1.0Volume multiplier. Valid range: 0.5 to 2.0. Included in synthesis payload for bulbul:v2.
booleanOptionalDefault: falseControls whether normalization of English words and numeric entities (for example, numbers and dates) is performed. Set to true for better handling of mixed-language text.
Only valid if model is bulbul:v2. This value is ignored for other models.
In Node.js this parameter is called enablePreprocessing.
stringOptionalDefault: 128kOutput audio bitrate. Allowed values: 32k, 64k, 96k, 128k, 192k.
Only available in the Python plugin.
integerOptionalDefault: 50Minimum character length that triggers buffer flushing for TTS model processing. Valid range: 30 to 200.
Only available in the Python plugin.
integerOptionalDefault: 150Maximum length for sentence splitting (adjust based on content length). Valid range: 50 to 500.
Only available in the Python plugin.
intOptionalDefault: 22050Output sample rate in Hz. Supported values: 8000, 16000, 22050, 24000.
In Node.js this parameter is called sampleRate.
Additional resources
The following resources provide more information about using Sarvam with LiveKit Agents.