Overview
Speechmatics provides enterprise-grade speech-to-text APIs. Their advanced speech models deliver highly accurate transcriptions across diverse languages, dialects, and accents. You can use the LiveKit Speechmatics plugin with the Agents framework to build voice AI agents that provide reliable, realtime transcriptions.
Quick reference
This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.
Installation
Install the plugin from PyPI:
pip install "livekit-agents[speechmatics]~=1.2"
Authentication
The Speechmatics plugin requires an API key.
Set SPEECHMATICS_API_KEY
in your .env
file.
Usage
Use Speechmatics STT in an AgentSession
or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
from livekit.plugins import speechmaticssession = AgentSession(stt = speechmatics.STT(),# ... llm, tts, etc.)
Speaker diarization
You can enable speaker diarization to identify individual speakers and their speech. When enabled, the transcription output can include this information through the speaker_id
and text
attributes.
See the following for example configurations and outputs:
<{speaker_id}>{text}</{speaker_id}>
:<S1>Hello</S1>
.[Speaker {speaker_id}] {text}
:[Speaker S1] Hello
.
stt = speechmatics.STT(enable_diarization=True,speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",)
Inform the LLM of the format for speaker identification by including it in your agent instructions. For a an example, see the following:
Speechmatics STT speaker diarization
An example of using Speechmatics to identify speakers in a multi-speaker conversation.
Parameters
This section describes the key parameters for the Speechmatics STT plugin. See the plugin reference for a complete list of all available parameters.
Operating point to use for the transcription. This parameter balances accuracy, speed, and resource usage. To learn more, see Operating points.
ISO 639-1 language code. All languages are global, meaning that regardless of which language you select, the system can recognize different dialects and accents. To see the full list, see Supported Languages.
Enable partial transcripts. Partial transcripts allow you to receive preliminary transcriptions and update as more context is available until the higher-accuracy final transcript is returned. Partials are returned faster but without any post-processing such as formatting. When enabled, the STT service emits INTERIM_TRANSCRIPT
events.
Enable speaker diarization. When enabled, spoken words are attributed to unique speakers. You can use the speaker_sensitivity
parameter to adjust the sensitivity of diarization. To learn more, see Diarization.
The maximum delay in seconds between the end of a spoken word and returning the final transcript results. Lower values can have an impact on accuracy.
The maximum delay in seconds of silence after the end of turn before the STT service returns the final transcript.
The delay mode to use for triggering end of turn. Valid values are:
EndOfUtteranceMode.FIXED
: Delay is fixed to the value ofend_of_utterance_silence_trigger
.EndOfUtteranceMode.ADAPTIVE
: Delay can be adjusted by the content of what the most recent speaker has said, including rate of speech and speaking patterns (for example, pauses).EndOfUtteranceMode.NONE
: Disables end of turn detection and uses a fallback timer.
To use LiveKit's end of turn detector model, set this parameter to EndOfUtteranceMode.NONE
.
Formatter for speaker identification in transcription output. The following attributes are available:
{speaker_id}
: The ID of the speaker.{text}
: The text spoken by the speaker.
By default, if speaker diarization is enabled and this parameter is not set, the transcription output is not formatted for speaker identification.
The system instructions for the language model might need to include any necessary instructions to handle the formatting. To learn more, see Speaker diarization.
Sensitivity of speaker detection. Valid values are between 0
and 1
. Higher values increase sensitivity and can help when two or more speakers have similar voices. To learn more, see Speaker sensitivity.
The enable_diarization
parameter must be set to True
for this parameter to take effect.
When speaker diarization is enabled and this is set to True
, it reduces the likelihood of switching between similar sounding speakers. To learn more, see Prefer current speaker.
Additional resources
The following resources provide more information about using Speechmatics with LiveKit Agents.
Python package
The livekit-plugins-speechmatics
package on PyPI.
Plugin reference
Reference for the Speechmatics STT plugin.
GitHub repo
View the source or contribute to the LiveKit Speechmatics STT plugin.
Speechmatics docs
Speechmatics STT docs.
Voice AI quickstart
Get started with LiveKit Agents and Speechmatics STT.