Overview
This plugin allows you to use Soniox as an STT provider for your voice agents.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[soniox]~=1.4"
Authentication
The Soniox plugin requires an API key from the Soniox console.
Set SONIOX_API_KEY in your .env file.
Usage
Use Soniox STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
Set STT options for Soniox using the params argument:
from livekit.plugins import sonioxsession = AgentSession(stt=soniox.STT(params=soniox.STTOptions(model="stt-rt-v4",language_hints=["en"])),# ... llm, tts, etc.)
Speaker diarization
You can enable speaker diarization so the STT assigns a speaker identifier to each word or segment. When enabled, each token includes a speaker field and the STT reports capabilities.diarization=True.
The following example enables speaker diarization:
from livekit.plugins import sonioxsession = AgentSession(stt=soniox.STT(params=soniox.STTOptions(model="stt-rt-v4",language_hints=["en"],enable_speaker_diarization=True,)),# ... llm, tts, etc.)
You can use MultiSpeakerAdapter to detect the primary speaker and format the transcripts by speaker. To learn more, see Speaker diarization and primary speaker detection.
Realtime translation
To use realtime translation, pass a TranslationConfig to STTOptions. Soniox supports two translation modes: one-way and two-way.
One-way translation
To translate from any detected language into a single target language, set type to "one_way" and specify the target_language. For example, to translate any spoken language into English:
from livekit.plugins import sonioxsession = AgentSession(stt=soniox.STT(params=soniox.STTOptions(model="stt-rt-v4",translation=soniox.TranslationConfig(type="one_way",target_language="en",),)),# ... llm, tts, etc.)
Two-way translation
To translate back and forth between two languages, set type to "two_way" and specify language_a and language_b. For example, to translate between English and Spanish:
from livekit.plugins import sonioxsession = AgentSession(stt=soniox.STT(params=soniox.STTOptions(model="stt-rt-v4",translation=soniox.TranslationConfig(type="two_way",language_a="en",language_b="es",),)),# ... llm, tts, etc.)
When translation is active, the SpeechData object in each SpeechEvent contains the translated text in the text field. The original spoken language and transcription are available in the source_languages and source_texts fields.
Parameters
The soniox.STT constructor takes an STTOptions object as the params argument. This section describes some of the available options. See the STTOptions reference for a complete list.
modelstringDefault: stt-rt-v4The Soniox STT model to use. See documentation for a complete list of supported models.
contextstringDefault: NoneFree-form text that provides additional context or vocabulary to bias transcription towards domain-specific terms.
enable_language_identificationbooleanDefault: trueWhen true, Soniox attempts to identify the language of the input audio.
enable_speaker_diarizationbooleanDefault: falseSet to True to enable speaker diarization.
translationTranslationConfigDefault: NoneEnable realtime translation. See realtime translation for details and examples.
Additional resources
The following resources provide more information about using Soniox with LiveKit Agents.