NVIDIA Riva STT plugin guide | LiveKit Documentation

Available inPython

Overview

This plugin allows you to use NVIDIA Riva as an STT provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[nvidia]~=1.5"

Authentication

The NVIDIA Riva plugin supports two authentication methods:

NVIDIA API Key: Set NVIDIA_API_KEY in your .env file to use NVIDIA's cloud services.
Self-Hosted NVIDIA Riva Server: Deploy your own NVIDIA Riva server and configure the plugin to communicate with it using the server parameter.

Usage

Use NVIDIA Riva STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import nvidia

session = AgentSession(
   stt=nvidia.STT(
      language_code="en-US",
   ),
   # ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

language_codestringDefault: en-US

Language code for speech recognition. See NVIDIA Riva documentation for a complete list of supported languages.

modelstringDefault: parakeet-1.1b-en-US-asr-streaming-silero-vad-sortformer

The NVIDIA Riva ASR model to use. The default model supports streaming with VAD and speaker diarization. See NVIDIA Riva documentation for available models.

serverstringDefault: grpc.nvcf.nvidia.com:443

The address of your NVIDIA Riva server. Defaults to NVIDIA's cloud service. Set this to a local address when using a self-hosted Riva NIM service.

enable_diarizationboolDefault: false

Set to True to enable speaker diarization.

max_speaker_countintDefault: 0

Maximum number of speakers to detect. Set to 0 for automatic detection.

Speaker diarization

You can enable speaker diarization so the STT assigns a speaker identifier to each transcript event. When enabled, the STT reports capabilities.diarization = True.

With diarization enabled, you can wrap the NVIDIA Riva STT with MultiSpeakerAdapter for primary speaker detection and transcript formatting.

Enable speaker diarization by setting enable_diarization=True in the STT constructor:

stt = nvidia.STT(
   language_code="en-US",
   enable_diarization=True,
   max_speaker_count=4,
)

Additional resources

The following resources provide more information about using NVIDIA Riva with LiveKit Agents.

Python package

The livekit-plugins-nvidia package on PyPI.

Plugin reference

Reference for the NVIDIA Riva STT plugin.

GitHub repo

View the source or contribute to the LiveKit NVIDIA Riva STT plugin.

NVIDIA Riva docs

NVIDIA Riva's official documentation and product page.

Voice AI quickstart

Get started with LiveKit Agents and NVIDIA Riva.

Example implementation

Example code showing how to use the NVIDIA Riva plugin with LiveKit Agents.

NVIDIA Riva TTS

Guide to the NVIDIA Riva TTS plugin with LiveKit Agents.