Speechmatics STT integration guide

How to use the Speechmatics STT plugin for LiveKit Agents.

Overview

Speechmatics provides enterprise-grade speech-to-text APIs. Their advanced speech models deliver highly accurate transcriptions across diverse languages, dialects, and accents. You can use the LiveKit Speechmatics plugin with the Agents framework to build voice AI agents that provide reliable, real-time transcriptions.

Quick reference

This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.

Installation

Install the plugin from PyPI:

pip install "livekit-agents[speechmatics]~=1.0"

Authentication

The Speechmatics plugin requires an API key.

Set SPEECHMATICS_API_KEY in your .env file.

Usage

Use Speechmatics STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import speechmatics
session = AgentSession(
stt = speechmatics.STT(
transcription_config=speechmatics.types.TranscriptionConfig(
operating_point="enhanced",
enable_partials=True,
language="en",
output_locale="en-US",
diarization="speaker",
enable_entities=True,
additional_vocab=[
{
"content": "financial crisis"
},
{
"content": "gnocchi",
"sounds_like": [
"nyohki",
"nokey",
"nochi"
]
},
{
"content": "CEO",
"sounds_like": [
"C.E.O."
]
}
],
max_delay=0.7,
max_delay_mode="flexible"
),
audio_settings=speechmatics.types.AudioSettings(
encoding="pcm_s16le",
sample_rate=16000,
),
),
# ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

operating_pointstringOptionalDefault: enhanced

Operating point to use for the transcription per required accuracy & complexity. To learn more, see Accuracy Reference.

enable_partialsboolOptionalDefault: True

Partial transcripts allow you to receive preliminary transcriptions and update as more context is available until the higher-accuracy final transcript is returned. Partials are returned faster but without any post-processing such as formatting.

languagestringOptionalDefault: en

ISO 639-1 language code. All languages are global and can understand different dialects/accents. To see the list of all supported languages, see Supported Languages.

Usage

Create a Speechmatics STT that can be used in a VoiceAgent or as a standalone transcription service. For example, you can use this STT in the VoiceAgent quickstart.

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

operating_pointstringOptionalDefault: enhanced

Operating point to use for the transcription per required accuracy & complexity. To learn more, see Accuracy Reference.

enable_partialsboolOptionalDefault: True

Partial transcripts allow you to receive preliminary transcriptions and update as more context is available until the higher-accuracy final transcript is returned. Partials are returned faster but without any post-processing such as formatting.

languagestringOptionalDefault: en

ISO 639-1 language code. All languages are global and can understand different dialects/accents. To see the list of all supported languages, see Supported Languages.

output_localestringOptionalDefault: en-US

RFC-5646 language code for transcription output. For supported locales, see Output Locale.

diarizationstringOptionalDefault: NULL

Setting this to speaker enables accurate labeling of different speakers detected with the attributed transcribed output e.g. S1, S2. For more information, visit Speaker Diarization.

additional_vocablist[dict{“content”:str, ”sounds_like”:str}]OptionalDefault: NULL

Add custom words for each transcription job. To learn more, see Custom Dictionary.

enable_entitiesboolOptionalDefault: False

Allows the written form of various entities such as phone numbers, emails, currency, etc to be output in the transcript. To learn more about the supported entities, see Entities.

max_delaynumberOptionalDefault: 0.7

The delay in seconds between the end of a spoken word and returning the final transcript results.

max_delay_modestringOptionalDefault: flexible

If set to flexible, the final transcript is delayed until proper numeral formatting is complete. To learn more, see Numeral Formatting.

Additional resources

The following resources provide more information about using Speechmatics with LiveKit Agents.

Voice AI quickstart

Get started with LiveKit Agents and Speechmatics STT.