Create a new agent in your browser using this model
Overview
Deepgram speech-to-text is available in LiveKit Agents through LiveKit Inference and the Deepgram plugin. Pricing for LiveKit Inference is available on the pricing page.
| Model name | Model ID | Languages |
|---|---|---|
| Flux | deepgram/flux-general | en |
| Nova-3 | deepgram/nova-3 or deepgram/nova-3-general | arar-AEar-SAar-QAar-KWar-SYar-LBar-PSar-JOar-EGar-SDar-TDar-MAar-DZar-TNar-IQar-IRenen-USen-AUen-GBen-INen-NZdenlsvsv-SEdada-DKeses-419frfr-CAptpt-BRpt-PTmulti |
| Nova-3 Medical | deepgram/nova-3-medical | enen-USen-AUen-CAen-GBen-IEen-INen-NZ |
| Nova-2 | deepgram/nova-2 or deepgram/nova-2-general | multibgcazhzh-CNzh-Hanszh-TWzh-Hantzh-HKcsdada-DKnlenen-USen-AUen-GBen-NZen-INetfinl-BEfrfr-CAdede-CHelhihuiditjakoko-KRlvltmsnoplptpt-BRpt-PTroruskeses-419svsv-SEthth-THtrukvi |
| Nova-2 Medical | deepgram/nova-2-medical | enen-US |
| Nova-2 Conversational AI | deepgram/nova-2-conversationalai | enen-US |
| Nova-2 Phonecall | deepgram/nova-2-phonecall | enen-US |
LiveKit Inference
Use LiveKit Inference to access Deepgram STT without a separate Deepgram API key.
Usage
To use Deepgram, use the STT class from the inference module:
from livekit.agents import AgentSession, inferencesession = AgentSession(stt=inference.STT(model="deepgram/flux-general",language="en"),# ... llm, tts, vad, turn_detection, etc.)
import { AgentSession, inference } from '@livekit/agents';session = new AgentSession({stt: new inference.STT({model: "deepgram/flux-general",language: "en"}),// ... llm, tts, vad, turn_detection, etc.});
Parameters
stringRequiredThe model to use for the STT. See the Model Options page for available models.
stringOptionalLanguage code for the transcription. If not set, the provider default applies. Set it to multi with supported models for multilingual transcription.
dictOptionalAdditional parameters to pass to the Deepgram STT API. Supported fields depend on the selected model. See the provider's documentation for more information.
In Node.js this parameter is called modelOptions.
Multilingual transcription
Deepgram Nova-3 and Nova-2 models support multilingual transcription. In this mode, the model automatically detects the language of each segment of speech and can accurately transcribe multiple languages in the same audio stream.
Multilingual transcription is billed at a different rate than monolingual transcription. Refer to the pricing page for more information.
To enable multilingual transcription on supported models, set the language to multi.
String descriptors
As a shortcut, you can also pass a model descriptor string directly to the stt argument in your AgentSession:
from livekit.agents import AgentSessionsession = AgentSession(stt="deepgram/flux-general:en",# ... llm, tts, vad, turn_detection, etc.)
import { AgentSession } from '@livekit/agents';session = new AgentSession({stt: "deepgram/flux-general:en",// ... llm, tts, vad, turn_detection, etc.});
Colocation of model and agent
LiveKit Inference includes an integrated deployment of Deepgram models in Mumbai, India, delivering significantly lower latency for voice agents serving users in India and surrounding regions. By reducing the round-trip to external API endpoints, this regional deployment with co-located STT and agent improves response times, resulting in more responsive and natural-feeling conversations.
Automatic routing
LiveKit Inference automatically routes requests to the regional deployment when your configuration matches one of the supported models and languages below. No code changes or configuration are required. For other configurations, requests are routed to Deepgram's API.
Supported configurations
| Model | Supported languages |
|---|---|
deepgram/nova-3-general | English (en), Hindi (hi), Multilingual (multi) |
deepgram/nova-2-general | English (en), Hindi (hi) |
deepgram/flux-general | English (en) |
For example, to use Hindi transcription with Nova-3:
from livekit.agents import AgentSessionsession = AgentSession(stt="deepgram/nova-3-general:hi",# ... llm, tts, etc.)
import { AgentSession } from '@livekit/agents';session = new AgentSession({stt: "deepgram/nova-3-general:hi",// ... llm, tts, etc.});
Turn detection
Deepgram Flux includes a custom phrase endpointing model that uses both acoustic and semantic cues. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.
session = AgentSession(turn_detection="stt",stt=inference.STT(model="deepgram/flux-general",language="en"),vad=silero.VAD.load(), # Recommended for responsive interruption handling# ... llm, tts, etc.)
Plugin
Use the Deepgram plugin to connect directly to Deepgram's API with your own API key.
Installation
Install the plugin from PyPI or npm:
uv add "livekit-agents[deepgram]~=1.4"
pnpm add @livekit/agents-plugin-deepgram@1.x
Authentication
The Deepgram plugin requires a Deepgram API key.
Set DEEPGRAM_API_KEY in your .env file.
Nova-3 and other models
Use the STT class for Nova-3 and other Deepgram models. It connects to Deepgram's /listen/v1 websocket API for realtime streaming STT.
Usage
from livekit.plugins import deepgramsession = AgentSession(stt=deepgram.STT(model="nova-3",language="en",),# ... llm, tts, etc.)
import * as deepgram from '@livekit/agents-plugin-deepgram';const session = new voice.AgentSession({stt: new deepgram.STT({model: "nova-3",language: "en",}),// ... llm, tts, etc.});
Parameter reference
This section describes the key parameters for the Deepgram STT plugin. See the plugin reference for a complete list of all available parameters.
stringOptionalDefault: nova-3The Deepgram model to use for speech recognition. Use STTv2 for the Flux model.
list[string]OptionalDefault: []List of key terms to improve recognition accuracy. Supported by Nova-3 models.
boolOptionalDefault: falseSet to True to enable speaker diarization.
Speaker diarization
You can enable speaker diarization so the STT assigns a speaker identifier to each word or segment. When enabled, transcript events include a speaker_id, and the STT reports capabilities.diarization = True.
With diarization enabled, you can wrap the Deepgram STT with MultiSpeakerAdapter for primary speaker detection and transcript formatting.
Enable speaker diarization by setting enable_diarization=True in the STT constructor:
stt = deepgram.STT(model="nova-3",language="en",enable_diarization=True,)
Deepgram Flux
Use the STTv2 class for the Flux model. It connects to Deepgram's /listen/v2 websocket API, which is designed for turn-based conversational audio. Currently, the only available model is Flux in English.
Usage
Use STTv2 in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
from livekit.plugins import deepgramsession = AgentSession(stt=deepgram.STTv2(model="flux-general-en",eager_eot_threshold=0.4,),# ... llm, tts, etc.)
import * as deepgram from '@livekit/agents-plugin-deepgram';const session = new voice.AgentSession({stt: new deepgram.STTv2({model: "flux-general-en",eagerEotThreshold: 0.4,}),// ... llm, tts, etc.});
Parameter reference
STTv2 exposes parameters specific to Deepgram's v2 API.
stringOptionalDefault: flux-general-enDefines the AI model used to process submitted audio. Currently, only the Flux model is available (flux-general-en). Use STT for the Nova-3 or Nova-2 models.
floatOptionalEnd-of-turn confidence required to fire an eager end-of-turn event. Valid range: 0.3–0.9.
floatOptionalEnd-of-turn confidence required to finish a turn. Valid range: 0.5–0.9.
numberOptionalA turn is finished after this much time has passed after speech, regardless of EOT confidence.
list[string]OptionalDefault: []Keyterm prompting can improve recognition of specialized terminology. Pass multiple keyterms to boost recognition of each.
booleanOptionalOpts out requests from the Deepgram Model Improvement Program. Check Deepgram docs for pricing impact before setting to true.
stringOptionalLabel your requests for identification during usage reporting.
For the full list of STTv2 parameters, see the plugin reference in Additional resources.
Turn detection
Deepgram Flux includes a custom phrase endpointing model that uses both acoustic and semantic cues. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.
session = AgentSession(turn_detection="stt",stt=deepgram.STTv2(model="flux-general-en",eager_eot_threshold=0.4,),vad=silero.VAD.load(), # Recommended for responsive interruption handling# ... llm, tts, etc.)
const session = new voice.AgentSession({turnDetection: "stt",stt: new deepgram.STTv2({model: "flux-general-en",eagerEotThreshold: 0.4,}),vad: await silero.VAD.load(), // Recommended for responsive interruption handling// ... llm, tts, etc.});
Additional resources
The following resources provide more information about using Deepgram with LiveKit Agents.