LiveKit docs › Models › TTS › Additional models › Sarvam

---

# Sarvam TTS plugin guide

> How to use the Sarvam TTS plugin for LiveKit Agents.

Available in:
- [x] Node.js
- [x] Python

## Overview

Use the Sarvam TTS plugin to synthesize Indian-language and English speech in LiveKit Agents. It provides natural Indic voices, low-latency turn-taking, configurable speaking style, and production audio formats for browser, mobile, and telephony use cases.

For new voice agents, start with `bulbul:v3`, set `target_language_code` explicitly, and choose a speaker that is compatible with the selected model.

### Authentication

The Sarvam plugin requires a [Sarvam API key](https://dashboard.sarvam.ai/key-management).

Set `SARVAM_API_KEY` in your `.env` file:

```shell
SARVAM_API_KEY=<your-sarvam-api-key>

```

### Installation

Install the plugin:

**Python**:

```shell
uv add "livekit-agents[sarvam]~=1.5"

```

---

**Node.js**:

```shell
pnpm add @livekit/agents-plugin-sarvam@1.x

```

### Usage

Use Sarvam TTS within an `AgentSession` or as a standalone speech generator. For example, you can use this TTS in the [Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md).

For most LiveKit voice agents, begin with the following settings. Explicit configuration makes voice quality, latency, and deployment behavior easier to reproduce across environments.

- `target_language_code` / `targetLanguageCode`: Set the language your agent should speak, for example `hi-IN` or `en-IN`.
- `model`: Use `bulbul:v3`.
- `speaker`: Use a speaker supported by the selected model. The default is `shubh` for `bulbul:v3`.
- `speech_sample_rate` / `sampleRate`: Use `22050` for general voice agent audio; use `8000` only when your downstream path requires narrowband telephony audio.
- `pace`: Start at `1.0`, then tune after listening to full agent turns.

**Python**:

```python
from livekit.agents import AgentSession
from livekit.plugins import sarvam

session = AgentSession(
   tts=sarvam.TTS(
      target_language_code="hi-IN",
      model="bulbul:v3",
      speaker="shubh",
      speech_sample_rate=22050,
      pace=1.0,
      output_audio_bitrate="128k",
      output_audio_codec="mp3",
      min_buffer_size=50,
      max_chunk_length=150,
      send_completion_event=True,
   ),
   # ... llm, stt, etc.
)

```

---

**Node.js**:

```typescript
import { voice } from '@livekit/agents';
import * as sarvam from '@livekit/agents-plugin-sarvam';

const session = new voice.AgentSession({
    tts: new sarvam.TTS({
        targetLanguageCode: "hi-IN",
        model: "bulbul:v3",
        speaker: "shubh",
        pace: 1.0,
        temperature: 0.6,
    }),
    // ... llm, stt, etc.
});

```

### Parameters

This section describes commonly used parameters. See the plugin reference links in the [Additional resources](#additional-resources) section for a complete list of all available parameters.

- **`target_language_code`** _(LanguageCode)_: The language for synthesized speech. In Node.js, this parameter is called `targetLanguageCode`.

Set this explicitly instead of relying on defaults. The text you send to TTS should match the selected target language and script for the most predictable output.

See [Sarvam's target-language documentation](https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert#request.body.target_language_code) for the list of supported languages.

- **`model`** _(string)_ (optional) - Default: `bulbul:v3`: The Sarvam TTS model to use. Valid values are:

- `bulbul:v3`
- `bulbul:v2`
Use `bulbul:v3` for new voice agent builds unless you need a `bulbul:v2`-only option such as `pitch`, `loudness`, or `enable_preprocessing`.

The default model for Node.js is `bulbul:v2`.

- **`speaker`** _(string)_ (optional) - Default: `varies by model`: The voice to use for synthesis. Defaults depend on the selected model:

- `shubh` for `bulbul:v3`
- `anushka` for `bulbul:v2`
Speakers are validated for model compatibility. If synthesis fails after changing `model` or `speaker`, check that the speaker is supported by that model.

- **`pace`** _(float)_ (optional) - Default: `1.0`: Speech rate multiplier. Valid range: `0.3` to `3.0`.

- **`temperature`** _(float)_ (optional) - Default: `0.6`: Controls output randomness. Valid range: `0.01` to `2.0`. Only sent if `model` is `bulbul:v3` or `bulbul:v3-beta`; ignored for `bulbul:v2`.

- **`pitch`** _(float)_ (optional) - Default: `0.0`: Voice pitch adjustment. Accepted range: `-0.75` to `0.75`. Values outside this range are silently adjusted to the nearest boundary by the Python plugin, which also logs a warning. Included in synthesis payload for `bulbul:v2`.

- **`dict_id`** _(string)_ (optional): Custom pronunciation dictionary ID. Only available for the `bulbul:v3` model. Create and manage dictionaries using the [Pronunciation Dictionary API](https://docs.sarvam.ai/api-reference-docs/pronunciation-dictionary/create).

In Node.js this parameter is called `dictId`.

- **`loudness`** _(float)_ (optional) - Default: `1.0`: Volume multiplier. Valid range: `0.5` to `2.0`. Included in synthesis payload for `bulbul:v2`.

- **`enable_preprocessing`** _(boolean)_ (optional) - Default: `false`: Controls whether normalization of English words and numeric entities, for example, numbers and dates, is performed.

This option is only valid if `model` is `bulbul:v2` and is ignored for other models.

In Node.js this parameter is called `enablePreprocessing`.

- **`speech_sample_rate`** _(int)_ (optional) - Default: `22050`: Output sample rate in Hz. Supported values: `8000`, `16000`, `22050`, `24000`, `32000`, `44100`, and `48000`.

In Node.js this parameter is called `sampleRate`.

- **`output_audio_bitrate`** _(string)_ (optional) - Default: `128k`: Available in:
- [ ] Node.js
- [x] Python

Output audio bitrate. Allowed values: `32k`, `64k`, `96k`, `128k`, `192k`.

- **`output_audio_codec`** _(string)_ (optional) - Default: `mp3`: Available in:
- [ ] Node.js
- [x] Python

Output audio codec. Allowed values are `aac`, `alaw`, `flac`, `linear16`, `mp3`, `mulaw`, `opus`, and `wav`. The Python plugin decodes `mulaw` and `alaw` to 16-bit PCM before emitting audio frames.

- **`min_buffer_size`** _(integer)_ (optional) - Default: `50`: Minimum character length that triggers buffer flushing for TTS model processing. Valid range: `30` to `200`.

- **`max_chunk_length`** _(integer)_ (optional) - Default: `150`: Maximum length for sentence splitting. Valid range: `50` to `500`.

- **`dict_id`** _(string)_ (optional): Custom pronunciation dictionary ID. Only sent when `model` is `bulbul:v3`.

- **`enable_cached_responses`** _(boolean)_ (optional): Enables Sarvam's cached responses beta option. Only sent when `model` is `bulbul:v2`.

- **`send_completion_event`** _(boolean)_ (optional) - Default: `true`: Controls whether the Sarvam WebSocket URL requests explicit completion events for streaming synthesis.

### Troubleshooting

Common issues and solutions for the Sarvam TTS plugin.

#### Unsupported speaker or model

If the plugin rejects your configuration, check the `model` and `speaker` combination. Speaker availability depends on the selected model, and some parameters are model-specific.

#### Audio starts too slowly

For streaming voice agents, review chunking and buffering first:

- Reduce `min_buffer_size` gradually if the agent waits too long before speaking.
- Reduce `max_chunk_length` if long LLM responses are delaying synthesis.
- Keep punctuation in the generated text so the TTS system can split speech naturally.
- Avoid changing several latency-related settings at once.

#### Speech sounds rushed, slow, or unnatural

Start with `pace=1.0` and `temperature=0.6`, then tune one setting at a time. If the agent speaks long paragraphs, consider splitting the LLM response into shorter, conversational sentences before it reaches TTS.

#### Output format does not match your media path

Check `speech_sample_rate`, `output_audio_codec`, and `output_audio_bitrate`. Browser playback, mobile playback, and telephony paths often need different formats. For phone calls, confirm whether your provider expects `8000` Hz audio, `mulaw`, `alaw`, or linear PCM.

#### Pronunciations are inconsistent

For `bulbul:v3`, use `dict_id` when you need consistent pronunciations for names, brands, product terms, acronyms, or domain-specific words, provided you have an existing Sarvam TTS pronunciation dictionary.

## Additional resources

The following resources provide more information about using Sarvam with LiveKit Agents.

- **[Sarvam docs](https://docs.sarvam.ai/)**: Sarvam's full docs site.

- **[Sarvam TTS API reference](https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert)**: Sarvam's text-to-speech API documentation.

- **[Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md)**: Get started with LiveKit Agents and Sarvam.

- **[Sarvam STT](https://docs.livekit.io/agents/models/stt/sarvam.md)**: Guide to the Sarvam STT plugin with LiveKit Agents.

---

This document was rendered at 2026-06-07T11:35:50.572Z.
For the latest version of this document, see [https://docs.livekit.io/agents/models/tts/sarvam.md](https://docs.livekit.io/agents/models/tts/sarvam.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).