Skip to main content

AssemblyAI STT

How to use AssemblyAI STT with LiveKit Agents.

Use in Agent Builder

Create a new agent in your browser using this model

Overview

AssemblyAI speech-to-text is available in LiveKit Agents through LiveKit Inference and the AssemblyAI plugin. Pricing for LiveKit Inference is available on the pricing page.

Model nameModel IDLanguages
Universal-Streaming
assemblyai/universal-streaming
enen-US
Universal-Streaming-Multilingual
assemblyai/universal-streaming-multilingual
enen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PT

LiveKit Inference

Use LiveKit Inference to access AssemblyAI STT without a separate AssemblyAI API key.

Usage

To use AssemblyAI, use the STT class from the inference module:

from livekit.agents import AgentSession, inference
session = AgentSession(
stt=inference.STT(
model="assemblyai/universal-streaming",
language="en"
),
# ... tts, stt, vad, turn_detection, etc.
)
import { AgentSession, inference } from '@livekit/agents';
session = new AgentSession({
stt: new inference.STT({
model: "assemblyai/universal-streaming",
language: "en"
}),
// ... tts, stt, vad, turn_detection, etc.
});

Parameters

modelstringRequired

The model to use for the STT.

languagestringOptional

Language code for the transcription. If not set, the provider default applies.

extra_kwargsdictOptional

Additional parameters to pass to the AssemblyAI Universal Streaming API, including format_turns, end_of_turn_confidence_threshold, min_end_of_turn_silence_when_confident, max_turn_silence, and keyterms_prompt. See the provider's documentation for more information.

In Node.js this parameter is called modelOptions.

String descriptors

As a shortcut, you can also pass a model descriptor string directly to the stt argument in your AgentSession:

from livekit.agents import AgentSession
session = AgentSession(
stt="assemblyai/universal-streaming:en",
# ... tts, stt, vad, turn_detection, etc.
)
import { AgentSession } from '@livekit/agents';
session = new AgentSession({
stt: "assemblyai/universal-streaming:en",
// ... tts, stt, vad, turn_detection, etc.
});

Turn detection

AssemblyAI includes a custom phrase endpointing model that uses both audio and linguistic information to detect turn boundaries. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.

session = AgentSession(
turn_detection="stt",
stt=inference.STT(
model="assemblyai/universal-streaming",
language="en"
),
vad=silero.VAD.load(), # Recommended for responsive interruption handling
# ... llm, tts, etc.
)

Plugin

Use the AssemblyAI plugin to connect directly to AssemblyAI's API with your own API key.

Available in
Python

Installation

Install the plugin from PyPI:

uv add "livekit-agents[assemblyai]~=1.4"

Authentication

The AssemblyAI plugin requires an AssemblyAI API key.

Set ASSEMBLYAI_API_KEY in your .env file.

Usage

Use AssemblyAI STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import assemblyai
session = AgentSession(
stt = assemblyai.STT(),
# ... vad, llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

modelstringOptionalDefault: universal-streaming

STT model to use. Accepted options are universal-streaming and universal-streaming-multilingual. Use universal-streaming-multilingual for non-English languages.

languagestringOptionalDefault: en

The language of the audio. For a full list of supported languages, see the Supported languages page.

end_of_turn_confidence_thresholdfloatOptionalDefault: 0.4

The confidence threshold to use when determining if the end of a turn has been reached.

min_end_of_turn_silence_when_confidentintOptionalDefault: 400

The minimum duration of silence required to detect end of turn when confident.

max_turn_silenceintOptionalDefault: 1280

The maximum duration of silence allowed in a turn before end of turn is triggered.

Turn detection

AssemblyAI includes a custom phrase endpointing model that uses both audio and linguistic information to detect turn boundaries. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.

session = AgentSession(
turn_detection="stt",
stt=assemblyai.STT(
end_of_turn_confidence_threshold=0.4,
min_end_of_turn_silence_when_confident=400,
max_turn_silence=1280,
),
vad=silero.VAD.load(), # Recommended for responsive interruption handling
# ... llm, tts, etc.
)

Additional resources

The following resources provide more information about using AssemblyAI with LiveKit Agents.