Skip to main content

Soniox STT plugin guide

How to use the Soniox STT plugin for LiveKit Agents.

Available in
Python

Overview

This plugin allows you to use Soniox as an STT provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[soniox]~=1.4"

Authentication

The Soniox plugin requires an API key from the Soniox console.

Set SONIOX_API_KEY in your .env file.

Usage

Use Soniox STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

Set STT options for Soniox using the params argument:

from livekit.plugins import soniox
session = AgentSession(
stt=soniox.STT(
params=soniox.STTOptions(
model="stt-rt-v4",
language_hints=["en"]
)
),
# ... llm, tts, etc.
)

Speaker diarization

You can enable speaker diarization so the STT assigns a speaker identifier to each word or segment. When enabled, each token includes a speaker field and the STT reports capabilities.diarization=True.

The following example enables speaker diarization:

from livekit.plugins import soniox
session = AgentSession(
stt=soniox.STT(
params=soniox.STTOptions(
model="stt-rt-v4",
language_hints=["en"],
enable_speaker_diarization=True,
)
),
# ... llm, tts, etc.
)

You can use MultiSpeakerAdapter to detect the primary speaker and format the transcripts by speaker. To learn more, see Speaker diarization and primary speaker detection.

Realtime translation

To use realtime translation, pass a TranslationConfig to STTOptions. Soniox supports two translation modes: one-way and two-way.

One-way translation

To translate from any detected language into a single target language, set type to "one_way" and specify the target_language. For example, to translate any spoken language into English:

from livekit.plugins import soniox
session = AgentSession(
stt=soniox.STT(
params=soniox.STTOptions(
model="stt-rt-v4",
translation=soniox.TranslationConfig(
type="one_way",
target_language="en",
),
)
),
# ... llm, tts, etc.
)

Two-way translation

To translate back and forth between two languages, set type to "two_way" and specify language_a and language_b. For example, to translate between English and Spanish:

from livekit.plugins import soniox
session = AgentSession(
stt=soniox.STT(
params=soniox.STTOptions(
model="stt-rt-v4",
translation=soniox.TranslationConfig(
type="two_way",
language_a="en",
language_b="es",
),
)
),
# ... llm, tts, etc.
)

When translation is active, the SpeechData object in each SpeechEvent contains the translated text in the text field. The original spoken language and transcription are available in the source_languages and source_texts fields.

Parameters

The soniox.STT constructor takes an STTOptions object as the params argument. This section describes some of the available options. See the STTOptions reference for a complete list.

modelstringDefault: stt-rt-v4

The Soniox STT model to use. See documentation for a complete list of supported models.

contextstringDefault: None

Free-form text that provides additional context or vocabulary to bias transcription towards domain-specific terms.

enable_language_identificationbooleanDefault: true

When true, Soniox attempts to identify the language of the input audio.

enable_speaker_diarizationbooleanDefault: false

Set to True to enable speaker diarization.

translationTranslationConfigDefault: None

Enable realtime translation. See realtime translation for details and examples.

Additional resources

The following resources provide more information about using Soniox with LiveKit Agents.