Deepgram STT integration guide

How to use the Deepgram STT plugin for LiveKit Agents.

Overview

Deepgram provides advanced speech recognition technology and AI-driven audio processing solutions. Customizable speech models allow you to fine tune transcription performance for your specific use case. With LiveKit's Deepgram integration and the Agents framework, you can build AI agents that provide high-accuracy transcriptions.

Quick reference

This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.

Installation

Install the plugin from PyPI:

pip install "livekit-agents[deepgram]~=1.0"

Authentication

The Deepgram plugin requires a Deepgram API key.

Set DEEPGRAM_API_KEY in your .env file.

Usage

Use Deepgram STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import deepgram
session = AgentSession(
stt = deepgram.STT(
model="nova-2-general",
interim_results=True,
smart_format=True,
punctuate=True,
filler_words=True,
profanity_filter=False,
keywords=[("LiveKit", 1.5)],
language="en-US",
),
# ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

modelstringOptionalDefault: nova-2-general

ID of the model to use for inference. To learn more, see supported models.

interim_resultsboolOptionalDefault: true

Enable preliminary results before the final transcription is available.

smart_formatboolOptionalDefault: true

Enable smart formatting to improve the readability of transcriptions.

punctuateboolOptionalDefault: true

Enable punctuation in transcriptions.

filler_wordsboolOptionalDefault: true

Enable filler words to improve turn detection.

profanity_filterboolOptionalDefault: false

Replace recognized profanity with asterisks in transcriptions.

keywordslist[tuple[string, float]]OptionalDefault: []

A list of keywords and intensifiers to boost or suppress in transcriptions. Positive values boost; negative values suppress.

languagestringOptionalDefault: en

Language of input audio in ISO-639-1 format.

Additional resources

The following resources provide more information about using Deepgram with LiveKit Agents.