Skip to main content

NVIDIA Riva STT plugin guide

How to use the NVIDIA Riva STT plugin for LiveKit Agents.

Available in
Python

Overview

This plugin allows you to use NVIDIA Riva as an STT provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[nvidia]~=1.4"

Authentication

The NVIDIA Riva plugin supports two authentication methods:

  1. NVIDIA API Key: Set NVIDIA_API_KEY in your .env file to use NVIDIA's cloud services.
  2. Self-Hosted NVIDIA Riva Server: Deploy your own NVIDIA Riva server and configure the plugin to communicate with it using the server parameter.

Usage

Use NVIDIA Riva STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import nvidia
session = AgentSession(
stt=nvidia.STT(
language_code="en-US",
),
# ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

language_codestringDefault: en-US

Language code for speech recognition. See NVIDIA Riva documentation for a complete list of supported languages.

modelstringDefault: parakeet-1.1b-en-US-asr-streaming-silero-vad-sortformer

The NVIDIA Riva ASR model to use. The default model supports streaming with VAD and speaker diarization. See NVIDIA Riva documentation for available models.

serverstringDefault: grpc.nvcf.nvidia.com:443

The address of your NVIDIA Riva server. Defaults to NVIDIA's cloud service. Set this to a local address when using a self-hosted Riva NIM service.

enable_diarizationboolDefault: false

Set to True to enable speaker diarization.

max_speaker_countintDefault: 0

Maximum number of speakers to detect. Set to 0 for automatic detection.

Speaker diarization

You can enable speaker diarization so the STT assigns a speaker identifier to each transcript event. When enabled, the STT reports capabilities.diarization = True.

With diarization enabled, you can wrap the NVIDIA Riva STT with MultiSpeakerAdapter for primary speaker detection and transcript formatting.

Enable speaker diarization by setting enable_diarization=True in the STT constructor:

stt = nvidia.STT(
language_code="en-US",
enable_diarization=True,
max_speaker_count=4,
)

Additional resources

The following resources provide more information about using NVIDIA Riva with LiveKit Agents.