Skip to main content

Deepgram STT plugin guide

How to use the Deepgram STT plugin for LiveKit Agents.

Available in
Python
|
Node.js

Overview

This plugin allows you to use Deepgram as an STT provider for your voice agents. It exposes two classes that target different Deepgram APIs:

  • STT — Supports Deepgram's Nova family of models for realtime STT.
  • STTv2 — Supports Deepgram's newest Flux model for conversational audio.
LiveKit Inference

Deepgram STT is also available in LiveKit Inference, with billing and integration handled automatically. See the docs for more information.

Installation and authentication

Install the plugin from PyPI or npm, then set your API key.

uv add "livekit-agents[deepgram]~=1.4"
pnpm add @livekit/agents-plugin-deepgram@1.x

The plugin requires a Deepgram API key. Set DEEPGRAM_API_KEY in your .env file.

Nova-3 and other models

Use the STT class for Nova-3 and other Deepgram models. It connects to Deepgram's /listen/v1 websocket API for realtime streaming STT.

Usage

from livekit.plugins import deepgram
session = AgentSession(
stt=deepgram.STT(
model="nova-3",
language="en",
),
# ... llm, tts, etc.
)
import * as deepgram from '@livekit/agents-plugin-deepgram';
const session = new voice.AgentSession({
stt: new deepgram.STT({
model: "nova-3",
language: "en",
}),
// ... llm, tts, etc.
});

Parameter reference

modelstringOptionalDefault: nova-3

The Deepgram model to use for speech recognition. Use STTv2 for the Flux model.

keytermslist[string]OptionalDefault: []

List of key terms to improve recognition accuracy. Supported by Nova-3 models.

For more parameters and details, see the plugin reference in Additional resources.

Deepgram Flux

Use the STTv2 class for the Flux model. It connects to Deepgram's /listen/v2 websocket API, which is designed for turn-based conversational audio. Currently, the only available model is Flux in English.

Usage

Use STTv2 in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import deepgram
session = AgentSession(
stt=deepgram.STTv2(
model="flux-general-en",
eager_eot_threshold=0.4,
),
# ... llm, tts, etc.
)
import * as deepgram from '@livekit/agents-plugin-deepgram';
const session = new voice.AgentSession({
stt: new deepgram.STTv2({
model: "flux-general-en",
eagerEotThreshold: 0.4,
}),
// ... llm, tts, etc.
});

Parameter reference

STTv2 exposes parameters specific to Deepgram's v2 API.

modelstringOptionalDefault: flux-general-en

Defines the AI model used to process submitted audio. Currently, only the Flux model is available (flux-general-en). Use STT for the Nova-3 or Nova-2 models.

eager_eot_thresholdfloatOptional

End-of-turn confidence required to fire an eager end-of-turn event. Valid range: 0.3–0.9.

eot_thresholdfloatOptional

End-of-turn confidence required to finish a turn. Valid range: 0.5–0.9.

eot_timeout_msnumberOptional

A turn is finished after this much time has passed after speech, regardless of EOT confidence.

keytermslist[string]OptionalDefault: []

Keyterm prompting can improve recognition of specialized terminology. Pass multiple keyterms to boost recognition of each.

mip_opt_outbooleanOptional

Opts out requests from the Deepgram Model Improvement Program. Check Deepgram docs for pricing impact before setting to true.

tagsstringOptional

Label your requests for identification during usage reporting.

For the full list of STTv2 parameters, see the plugin reference in Additional resources.

Turn detection

Deepgram Flux includes a custom phrase endpointing model that uses both acoustic and semantic cues. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.

session = AgentSession(
turn_detection="stt",
stt=deepgram.STTv2(
model="flux-general-en",
eager_eot_threshold=0.4,
),
vad=silero.VAD.load(), # Recommended for responsive interruption handling
# ... llm, tts, etc.
)
const session = new voice.AgentSession({
turnDetection: "stt",
stt: new deepgram.STTv2({
model: "flux-general-en",
eagerEotThreshold: 0.4,
}),
vad: await silero.VAD.load(), // Recommended for responsive interruption handling
// ... llm, tts, etc.
});

Additional resources

The following resources provide more information about using Deepgram with LiveKit Agents.