LiveKit docs › Models › STT › Deepgram

---

# Deepgram STT

> How to use Deepgram STT with LiveKit Agents.

## Overview

Deepgram speech-to-text is available in LiveKit Agents through [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md) and the [Deepgram plugin](#plugin). With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you want to manage your own billing and rate limits. Pricing for LiveKit Inference is available on the [pricing page](https://livekit.io/pricing/inference#stt).

## LiveKit Inference

Use [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md) to access Deepgram STT without a separate Deepgram API key.

| Model name | Model ID | Languages |
| -------- | -------- | --------- |
| Flux | `deepgram/flux-general-en` | `en` |
| Flux (Multilingual) | `deepgram/flux-general-multi` | `en`, `es`, `fr`, `de`, `hi`, `ru`, `pt`, `ja`, `it`, `nl` |
| Nova-2 | `deepgram/nova-2` | `multi`, `bg`, `ca`, `zh`, `zh-CN`, `zh-Hans`, `zh-TW`, `zh-Hant`, `zh-HK`, `cs`, `da`, `da-DK`, `nl`, `nl-BE`, `en`, `en-US`, `en-AU`, `en-GB`, `en-NZ`, `en-IN`, `et`, `fi`, `fr`, `fr-CA`, `de`, `de-CH`, `el`, `hi`, `hu`, `id`, `it`, `ja`, `ko`, `ko-KR`, `lv`, `lt`, `ms`, `no`, `pl`, `pt`, `pt-BR`, `pt-PT`, `ro`, `ru`, `sk`, `es`, `es-419`, `sv`, `sv-SE`, `th`, `th-TH`, `tr`, `uk`, `vi` |
| Nova-2 Conversational AI | `deepgram/nova-2-conversationalai` | `en`, `en-US` |
| Nova-2 Medical | `deepgram/nova-2-medical` | `en`, `en-US` |
| Nova-2 Phone Call | `deepgram/nova-2-phonecall` | `en`, `en-US` |
| Nova-3 (Monolingual) | `deepgram/nova-3` | `ar`, `ar-AE`, `ar-SA`, `ar-QA`, `ar-KW`, `ar-SY`, `ar-LB`, `ar-PS`, `ar-JO`, `ar-EG`, `ar-SD`, `ar-TD`, `ar-MA`, `ar-DZ`, `ar-TN`, `ar-IQ`, `ar-IR`, `be`, `bn`, `bs`, `bg`, `ca`, `hr`, `cs`, `da`, `da-DK`, `nl`, `nl-BE`, `en`, `en-US`, `en-AU`, `en-GB`, `en-IN`, `en-NZ`, `et`, `fi`, `fr`, `fr-CA`, `de`, `de-CH`, `el`, `hi`, `hu`, `id`, `it`, `ja`, `kn`, `ko`, `ko-KR`, `lv`, `lt`, `mk`, `ms`, `mr`, `no`, `pl`, `pt`, `pt-BR`, `pt-PT`, `ro`, `ru`, `sr`, `sk`, `sl`, `es`, `es-419`, `sv`, `sv-SE`, `tl`, `ta`, `te`, `tr`, `uk`, `vi` |
| Nova-3 Medical | `deepgram/nova-3-medical` | `en`, `en-US`, `en-AU`, `en-CA`, `en-GB`, `en-IE`, `en-IN`, `en-NZ` |
| Nova-3 (Multilingual) | `deepgram/nova-3-multi` | `multi` |

### Usage

To use Deepgram, use the `STT` class from the `inference` module:

**Python**:

```python
from livekit.agents import AgentSession, inference

session = AgentSession(
    stt=inference.STT(
        model="deepgram/flux-general",
        language="en"
    ),
    # ... llm, tts, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession, inference } from '@livekit/agents';

session = new AgentSession({
    stt: new inference.STT({
        model: "deepgram/flux-general",
        language: "en"
    }),
    // ... llm, tts, vad, turnHandling, etc.
});

```

### Parameters

- **`model`** _(string)_: The model to use for the STT. See [model IDs](#inference) for available models.

- **`language`** _(LanguageCode)_ (optional): [Language code](https://docs.livekit.io/agents/models/stt.md#language-codes) for the transcription. If not set, the provider default applies. Set it to `multi` with supported models for multilingual transcription.

- **`extra_kwargs`** _(dict)_ (optional): Additional parameters to pass to the Deepgram STT API. Supported fields depend on the selected model. See [model parameters](#model-parameters) for supported fields.

In Node.js this parameter is called `modelOptions`.

#### Model parameters

Pass the following parameters inside `extra_kwargs` (Python) or `modelOptions` (Node.js). Supported fields depend on the selected model.

**Nova models**:

| Parameter | Type | Default | Notes |
| filler_words | `bool` | `True` | Whether to include filler words (um, uh, etc.) in the transcript. |
| interim_results | `bool` | `True` | Whether to return in-progress transcription results before the final transcript. Disabling this reduces the number of messages but increases latency. |
| endpointing | `int` | `25` | Milliseconds of silence before a turn is considered complete. |
| punctuate | `bool` | `True` | Whether to add punctuation and capitalization to the transcript. |
| smart_format | `bool` |  | Whether to apply smart formatting to numbers, dates, currency, URLs, and other entities. |
| keywords | `list[tuple[str, float]]` |  | List of keyword/boost pairs to improve recognition of specific terms. Each entry is a `(keyword, boost_factor)` tuple. Supported by Nova-2 models. |
| keyterm | `str | list[str]` |  | One or more terms to boost recognition accuracy for. Supported by Nova-3 models. |
| profanity_filter | `bool` |  | Whether to replace profanity in the transcript with asterisks. |
| numerals | `bool` |  | Whether to convert spoken numbers to numerical digits (for example, "four score" → "4 score"). |
| mip_opt_out | `bool` | `False` | Opt out of the [Deepgram Model Improvement Program](https://dpgr.am/deepgram-mip). Check Deepgram docs for pricing impact before setting to `True`. |
| vad_events | `bool` | `False` | Whether to emit voice activity detection events when speech starts and ends. |
| diarize | `bool` |  | Whether to identify and label individual speakers in the transcript. |
| dictation | `bool` |  | Whether to convert spoken punctuation commands (for example, "period", "comma") into punctuation marks. |
| detect_language | `bool` |  | Whether to automatically detect the spoken language. Detection results are included in the transcript response. |
| no_delay | `bool` | `True` | Whether to return transcription results as quickly as possible without waiting for additional audio context. |
| utterance_end | `bool` |  | Whether to emit an event when an utterance ends based on silence. Requires `interim_results: True`. |
| redact | `str | list[str]` |  | Redact sensitive information from the transcript. Accepted values include `"pci"` (credit card numbers), `"numbers"`, and `"ssn"`. |
| replace | `str | list[str]` |  | Swap terms in the transcript. Each entry uses the format `"find:replace"` (for example, `"LiveKit:Livekit"`). |
| search | `str | list[str]` |  | One or more terms to search for in the transcript. Matches are returned with their position and confidence in the response. |
| tag | `str | list[str]` |  | Label requests for identification in Deepgram usage reports. |
| channels | `int` |  | Number of independent audio channels in the submitted audio. Use when processing multi-channel recordings. |
| version | `str` |  | Version of the model to use (for example, `"latest"` or a specific date string). |
| callback | `str` |  | URL to call when transcription is complete. Primarily applicable to non-streaming (batch) requests. |
| callback_method | `str` |  | HTTP method to use when calling `callback` (for example, `"post"` or `"put"`). |
| extra | `str` |  | Additional URL-encoded query parameters to forward to the Deepgram API. |

---

**Flux models**:

| Parameter | Type | Default | Notes |
| eager_eot_threshold | `float` | `0.5` | End-of-turn confidence required to fire an eager end-of-turn event. Valid range: `0.3`–`0.9`. |
| eot_threshold | `float` |  | End-of-turn confidence required to finish a turn. Valid range: `0.5`–`0.9`. |
| eot_timeout_ms | `int` |  | A turn is finished after this many milliseconds of silence, regardless of EOT confidence. |
| keyterm | `str | list[str]` |  | One or more terms to boost recognition accuracy for. |
| mip_opt_out | `bool` | `False` | Opt out of the [Deepgram Model Improvement Program](https://dpgr.am/deepgram-mip). Check Deepgram docs for pricing impact before setting to `True`. |
| detect_language | `bool` |  | Whether to automatically detect the spoken language. |
| tag | `str | list[str]` |  | Label requests for identification in Deepgram usage reports. |

### Multilingual transcription

Deepgram Nova-3 and Nova-2 models support multilingual transcription. In this mode, the model automatically detects the language of each segment of speech and can accurately transcribe multiple languages in the same audio stream.

Multilingual transcription is billed at a different rate than monolingual transcription. Refer to the [pricing page](https://livekit.io/pricing/inference#stt) for more information.

To enable multilingual transcription on supported models, set the language to `multi`.

### String descriptors

As a shortcut, you can also pass a [model ID](#inference) string directly to the `stt` argument in your `AgentSession`:

**Python**:

```python
from livekit.agents import AgentSession

session = AgentSession(
    stt="deepgram/flux-general:en",
    # ... llm, tts, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    stt: "deepgram/flux-general:en",
    // ... llm, tts, vad, turnHandling, etc.
});

```

### Colocation of model and agent

LiveKit Inference includes an integrated deployment of Deepgram models in Mumbai, India, delivering significantly lower latency for voice agents serving users in India and surrounding regions. By reducing the round-trip to external API endpoints, this regional deployment with co-located STT and agent improves response times, resulting in more responsive and natural-feeling conversations.

#### Automatic routing

LiveKit Inference automatically routes requests to the regional deployment when your configuration matches one of the supported models and languages below. No code changes or configuration are required. For other configurations, requests are routed to Deepgram's API.

#### Supported configurations

| Model | Supported languages |
| `deepgram/nova-3-general` | English (`en`), Hindi (`hi`), Multilingual (`multi`) |
| `deepgram/nova-2-general` | English (`en`), Hindi (`hi`) |
| `deepgram/flux-general` | English (`en`) |

For example, to use Hindi transcription with Nova-3:

**Python**:

```python
from livekit.agents import AgentSession

session = AgentSession(
    stt="deepgram/nova-3-general:hi",
    # ... llm, tts, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    stt: "deepgram/nova-3-general:hi",
    // ... llm, tts, etc.
});

```

### Turn detection

Deepgram Flux includes a custom phrase endpointing model that uses both acoustic and semantic cues. To use this model for [turn detection](https://docs.livekit.io/agents/logic/turns.md), set `turn_detection="stt"` in the turn handling options. You should also provide a VAD plugin for responsive interruption handling.

```python
session = AgentSession(
    turn_handling=TurnHandlingOptions(
        turn_detection="stt",
    ),
    stt=inference.STT(
        model="deepgram/flux-general",
        language="en"
    ),
    vad=silero.VAD.load(),  # Recommended for responsive interruption handling
    # ... llm, tts, etc.
)

```

## Plugin

LiveKit's plugin support for Deepgram lets you connect directly to Deepgram's API with your own API key.

Available in:
- [x] Node.js
- [x] Python

### Installation

Install the plugin from PyPI or npm:

**Python**:

```shell
uv add "livekit-agents[deepgram]~=1.4"

```

---

**Node.js**:

```shell
pnpm add @livekit/agents-plugin-deepgram@1.x

```

### Authentication

The Deepgram plugin requires a [Deepgram API key](https://console.deepgram.com/).

Set `DEEPGRAM_API_KEY` in your `.env` file.

### Nova-3 and other models

Use the **`STT`** class for Nova-3 and other Deepgram models. It connects to Deepgram's `/listen/v1` websocket API for realtime streaming STT.

#### Usage

**Python**:

```python
from livekit.plugins import deepgram

session = AgentSession(
   stt=deepgram.STT(
      model="nova-3",
      language="en",
   ),
   # ... llm, tts, etc.
)

```

---

**Node.js**:

```typescript
import * as deepgram from '@livekit/agents-plugin-deepgram';

const session = new voice.AgentSession({
    stt: new deepgram.STT({
        model: "nova-3",
        language: "en",
    }),
    // ... llm, tts, etc.
});

```

#### Parameter reference

This section describes the key parameters for the Deepgram STT plugin. See the [plugin reference](https://docs.livekit.io/reference/python/livekit/plugins/deepgram/index.html.md#livekit.plugins.deepgram.STT) for a complete list of all available parameters.

- **`model`** _(string)_ (optional) - Default: `nova-3`: The Deepgram model to use for speech recognition. Use `STTv2` for the Flux model. See the [Model Options](https://developers.deepgram.com/docs/model) page for available models.

- **`keyterm`** _(str | list[str])_ (optional) - Default: `[]`: One or more terms to boost recognition accuracy for. Supported by Nova-3 models.

- **`enable_diarization`** _(bool)_ (optional) - Default: `false`: Set to `True` to enable [speaker diarization](#speaker-diarization).

#### Speaker diarization

You can enable [speaker diarization](https://developers.deepgram.com/docs/diarization) so the STT assigns a speaker identifier to each word or segment. When enabled, transcript events include a `speaker_id`, and the STT reports `capabilities.diarization = True`.

With diarization enabled, you can wrap the Deepgram STT with [`MultiSpeakerAdapter`](https://docs.livekit.io/agents/models/stt.md#speaker-diarization) for primary speaker detection and transcript formatting.

Enable speaker diarization by setting `enable_diarization=True` in the `STT` constructor:

```python
stt = deepgram.STT(
   model="nova-3",
   language="en",
   enable_diarization=True,
)

```

### Deepgram Flux

Use the **`STTv2`** class for the Flux model. It connects to Deepgram's `/listen/v2` websocket API, which is designed for turn-based conversational audio. Currently, the only available model is Flux in English.

#### Usage

Use `STTv2` in an `AgentSession` or as a standalone transcription service. For example, you can use this STT in the [Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md).

**Python**:

```python
from livekit.plugins import deepgram

session = AgentSession(
   stt=deepgram.STTv2(
      model="flux-general-en",
      eager_eot_threshold=0.4,
   ),
   # ... llm, tts, etc.
)

```

---

**Node.js**:

```typescript
import * as deepgram from '@livekit/agents-plugin-deepgram';

const session = new voice.AgentSession({
    stt: new deepgram.STTv2({
        model: "flux-general-en",
        eagerEotThreshold: 0.4,
    }),
    // ... llm, tts, etc.
});

```

#### Parameter reference

STTv2 exposes parameters specific to Deepgram's v2 API.

- **`model`** _(string)_ (optional) - Default: `flux-general-en`: Defines the AI model used to process submitted audio. Currently, only the Flux model is available (`flux-general-en`). Use `STT` for the Nova-3 or Nova-2 models.

- **`eager_eot_threshold`** _(float)_ (optional): End-of-turn confidence required to fire an eager end-of-turn event. Valid range: 0.3–0.9.

- **`eot_threshold`** _(float)_ (optional): End-of-turn confidence required to finish a turn. Valid range: 0.5–0.9.

- **`eot_timeout_ms`** _(number)_ (optional): A turn is finished after this much time has passed after speech, regardless of EOT confidence.

- **`keyterm`** _(str | list[str])_ (optional) - Default: `[]`: Keyterm prompting can improve recognition of specialized terminology. Pass multiple keyterms to boost recognition of each.

- **`mip_opt_out`** _(boolean)_ (optional): Opts out requests from the [Deepgram Model Improvement Program](https://dpgr.am/deepgram-mip). Check Deepgram docs for pricing impact before setting to true.

- **`tags`** _(string)_ (optional): Label your requests for identification during usage reporting.

For the full list of STTv2 parameters, see the plugin reference in [Additional resources](#additional-resources).

#### Turn detection

Deepgram Flux includes a custom phrase endpointing model that uses both acoustic and semantic cues. To use this model for [turn detection](https://docs.livekit.io/agents/logic/turns.md), set `turn_detection="stt"` in the turn handling options. You should also provide a VAD plugin for responsive interruption handling.

**Python**:

```python
session = AgentSession(
    turn_handling=TurnHandlingOptions(
        turn_detection="stt",
    ),
    stt=deepgram.STTv2(
        model="flux-general-en",
        eager_eot_threshold=0.4,
    ),
    vad=silero.VAD.load(),  # Recommended for responsive interruption handling
    # ... llm, tts, etc.
)

```

---

**Node.js**:

```typescript
const session = new voice.AgentSession({
    stt: new deepgram.STTv2({
        model: "flux-general-en",
        eagerEotThreshold: 0.4,
    }),
    vad: await silero.VAD.load(),  // Recommended for responsive interruption handling
    turnHandling: {
        turnDetection: "stt",
    },
    // ... llm, tts, etc.
});

```

## Additional resources

The following resources provide more information about using Deepgram with LiveKit Agents.

- **[Deepgram docs](https://developers.deepgram.com/docs)**: Deepgram's full docs site.

- **[Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md)**: Get started with LiveKit Agents and Deepgram.

- **[Deepgram TTS](https://docs.livekit.io/agents/models/tts/deepgram.md)**: Guide to the Deepgram TTS plugin with LiveKit Agents.

---

This document was rendered at 2026-04-09T17:15:36.707Z.
For the latest version of this document, see [https://docs.livekit.io/agents/models/stt/deepgram.md](https://docs.livekit.io/agents/models/stt/deepgram.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).