Overview
This plugin allows you to use Speechmatics as an STT provider for your voice agents.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[speechmatics,silero]~=1.5"
The default turn detection mode (EXTERNAL) uses an external VAD, so this command includes the silero extra to install Silero VAD . For details and other modes, see Turn detection.
Authentication
The Speechmatics plugin requires an API key .
Set SPEECHMATICS_API_KEY in your .env file.
Usage
Use Speechmatics STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
from livekit.plugins import speechmaticssession = AgentSession(stt=speechmatics.STT(),# ... llm, tts, etc.)
Speaker diarization
You can enable speaker diarization to identify individual speakers and their speech. When enabled, the transcription output can include this information through the speaker_id and text attributes.
See the following for example configurations and outputs:
<{speaker_id}>{text}</{speaker_id}>:<S1>Hello</S1>.[Speaker {speaker_id}] {text}:[Speaker S1] Hello.
stt = speechmatics.STT(enable_diarization=True,speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",)
Use the MultiSpeakerAdapter to detect the primary speaker and format the transcripts by speaker. See the Speaker diarization and primary speaker detection section for more details.
Turn detection
The turn_detection_mode parameter sets which component detects the end of a turn. It supports four modes:
| Mode | Detection method |
|---|---|
EXTERNAL (default) | An external VAD. Works with LiveKit's turn detection. |
ADAPTIVE | Speechmatics' own voice activity detection and the pace of speech. |
SMART_TURN | A Speechmatics machine learning model. |
FIXED | A fixed period of silence, set by the end_of_utterance_silence_trigger parameter. |
The default EXTERNAL mode needs no extra configuration, as shown in the Usage example.
In EXTERNAL mode, the plugin loads Silero VAD automatically. To use a different VAD, pass it as the vad parameter. To provide the end-of-turn signal yourself, pass vad=None.
To have Speechmatics detect the end of a turn instead, set turn_detection_mode to ADAPTIVE, SMART_TURN, or FIXED, and set turn_detection="stt" in the turn handling options:
from livekit.agents import AgentSession, TurnHandlingOptionsfrom livekit.plugins import speechmatics, silerosession = AgentSession(stt=speechmatics.STT(turn_detection_mode=speechmatics.TurnDetectionMode.ADAPTIVE,),turn_handling=TurnHandlingOptions(turn_detection="stt",),vad=silero.VAD.load(), # Recommended for responsive interruption handling# ... llm, tts, etc.)
Parameters
This section describes the key parameters for the Speechmatics STT plugin. See the plugin reference for a complete list of all available parameters.
operating_pointstringDefault: enhancedOperating point to use for the transcription. This parameter balances accuracy, speed, and resource usage. To learn more, see Operating points .
languageLanguageCodeDefault: enLanguage code for the input audio. All languages are global, meaning that regardless of which language you select, the system can recognize different dialects and accents. For the full list, see Supported Languages .
include_partialsboolDefault: trueInclude partial transcripts in the output. Partial transcripts allow you to receive preliminary transcriptions and update as more context is available until the higher-accuracy final transcript is returned. Partials are returned faster but without any post-processing such as formatting. When enabled, the STT service emits INTERIM_TRANSCRIPT events.
enable_diarizationboolDefault: trueEnable speaker diarization. When enabled, spoken words are attributed to unique speakers. You can use the speaker_sensitivity parameter to adjust the sensitivity of diarization. To learn more, see Diarization .
max_delaynumberDefault: 1.0The maximum delay in seconds between the end of a spoken word and returning the final transcript results. Lower values can have an impact on accuracy.
end_of_utterance_silence_triggerfloatThe duration of silence in seconds that triggers end of utterance. The delay is used to wait for any further transcribed words before emitting FINAL_TRANSCRIPT events.
turn_detection_modeTurnDetectionModeDefault: TurnDetectionMode.EXTERNALSets which component detects the end of a turn. Defaults to TurnDetectionMode.EXTERNAL, which uses an external VAD and works with LiveKit's turn detection. See Turn detection for the available modes and examples.
speaker_active_formatstringFormatter for speaker identification in transcription output. The following attributes are available:
{speaker_id}: The ID of the speaker.{text}: The text spoken by the speaker.
By default, if speaker diarization is enabled and this parameter is not set, the transcription output is not formatted for speaker identification.
The system instructions for the language model might need to include any necessary instructions to handle the formatting. To learn more, see Speaker diarization.
speaker_sensitivityfloatDefault: 0.5Sensitivity of speaker detection. Valid values are between 0 and 1. Higher values increase sensitivity and can help when two or more speakers have similar voices. To learn more, see Speaker sensitivity .
The enable_diarization parameter must be set to True for this parameter to take effect.
prefer_current_speakerboolDefault: falseWhen speaker diarization is enabled and this is set to True, it reduces the likelihood of switching between similar sounding speakers. To learn more, see Prefer current speaker .
Additional resources
The following resources provide more information about using Speechmatics with LiveKit Agents.
Python package
The livekit-plugins-speechmatics package on PyPI.
Plugin reference
Reference for the Speechmatics STT plugin.
GitHub repo
View the source or contribute to the LiveKit Speechmatics STT plugin.
Speechmatics docs
Speechmatics STT docs.
Voice AI quickstart
Get started with LiveKit Agents and Speechmatics STT.