Overview
LiveKit Inference offers transcription powered by Deepgram. Pricing information is available on the pricing page.
| Model name | Model ID | Languages |
|---|---|---|
| Flux | deepgram/flux-general | en |
| Nova-3 | deepgram/nova-3 or deepgram/nova-3-general | enen-USen-AUen-GBen-INdenlsvsv-SEdada-DKeses-419frfr-CAptpt-BRpt-PTmulti |
| Nova-3 Medical | deepgram/nova-3-medical | enen-USen-AUen-CAen-GBen-IEen-INen-NZ |
| Nova-2 | deepgram/nova-2 or deepgram/nova-2-general | multibgcazhzh-CNzh-Hanszh-TWzh-Hantzh-HKcsdada-DKnlenen-USen-AUen-GBen-NZen-INetfinl-BEfrfr-CAdede-CHelhihuiditjakoko-KRlvltmsnoplptpt-BRpt-PTroruskeses-419svsv-SEthth-THtrukvi |
| Nova-2 Medical | deepgram/nova-2-medical | enen-US |
| Nova-2 Conversational AI | deepgram/nova-2-conversationalai | enen-US |
| Nova-2 Phonecall | deepgram/nova-2-phonecall | enen-US |
Usage
To use Deepgram, pass a descriptor with the model and language to the stt argument in your AgentSession:
from livekit.agents import AgentSessionsession = AgentSession(stt="deepgram/flux-general:en",# ... llm, tts, vad, turn_detection, etc.)
import { AgentSession } from '@livekit/agents';session = new AgentSession({stt="deepgram/flux-general:en",// ... llm, tts, vad, turn_detection, etc.});
Multilingual transcription
Deepgram Nova-3 and Nova-2 models support multilingual transcription. In this mode, the model automatically detects the language of each segment of speech and can accurately transcribe multiple languages in the same audio stream.
Multilingual transcription is billed at a different rate than monolingual transcription. Refer to the pricing page for more information.
To enable multilingual transcription on supported models, set the language to multi.
Parameters
To customize additional parameters, including the language to use, use the STT class from the inference module:
from livekit.agents import AgentSession, inferencesession = AgentSession(stt=inference.STT(model="deepgram/flux-general",language="en"),# ... llm, tts, vad, turn_detection, etc.)
import { AgentSession, inference } from '@livekit/agents';session = new AgentSession({stt: new inference.STT({model: "deepgram/flux-general",language: "en"}),// ... llm, tts, vad, turn_detection, etc.});
stringRequiredThe model to use for the STT. See the Model Options page for available models.
stringOptionalLanguage code for the transcription. If not set, the provider default applies. Set it to multi with supported models for multilingual transcription.
dictOptionalAdditional parameters to pass to the Deepgram STT API. Supported fields depend on the selected model. See the provider's documentation for more information.
In Node.js this parameter is called modelOptions.
Integrated regional deployment
LiveKit Inference includes an integrated deployment of Deepgram models in Mumbai, India, delivering significantly lower latency for voice agents serving users in India and surrounding regions. By reducing the round-trip to external API endpoints, this regional deployment improves STT response times, resulting in more responsive and natural-feeling conversations.
Automatic routing
LiveKit Inference automatically routes requests to the regional deployment when your configuration matches one of the supported models and languages below. No code changes or configuration are required. For other configurations, requests are routed to Deepgram's API.
Supported configurations
| Model | Supported languages |
|---|---|
deepgram/nova-3-general | English (en), Hindi (hi), Multilingual (multi) |
deepgram/nova-2-general | English (en), Hindi (hi) |
deepgram/flux-general | English (en) |
For example, to use Hindi transcription with Nova-3:
from livekit.agents import AgentSessionsession = AgentSession(stt="deepgram/nova-3-general:hi",# ... llm, tts, etc.)
import { AgentSession } from '@livekit/agents';session = new AgentSession({stt: "deepgram/nova-3-general:hi",// ... llm, tts, etc.});
Turn detection
Deepgram Flux includes a custom phrase endpointing model that uses both acoustic and semantic cues. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.
session = AgentSession(turn_detection="stt",stt=inference.STT(model="deepgram/flux-general",language="en"),vad=silero.VAD.load(), # Recommended for responsive interruption handling# ... llm, tts, etc.)
Additional resources
The following links provide more information about Deepgram in LiveKit Inference.