Create a new agent in your browser using this model
Overview
AssemblyAI speech-to-text is available in LiveKit Agents through LiveKit Inference and the AssemblyAI plugin. Pricing for LiveKit Inference is available on the pricing page.
| Model name | Model ID | Languages |
|---|---|---|
| Universal-Streaming | assemblyai/universal-streaming | enen-US |
| Universal-Streaming-Multilingual | assemblyai/universal-streaming-multilingual | enen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PT |
LiveKit Inference
Use LiveKit Inference to access AssemblyAI STT without a separate AssemblyAI API key.
Usage
To use AssemblyAI, use the STT class from the inference module:
from livekit.agents import AgentSession, inferencesession = AgentSession(stt=inference.STT(model="assemblyai/universal-streaming",language="en"),# ... tts, stt, vad, turn_detection, etc.)
import { AgentSession, inference } from '@livekit/agents';session = new AgentSession({stt: new inference.STT({model: "assemblyai/universal-streaming",language: "en"}),// ... tts, stt, vad, turn_detection, etc.});
Parameters
stringRequiredThe model to use for the STT.
stringOptionalLanguage code for the transcription. If not set, the provider default applies.
dictOptionalAdditional parameters to pass to the AssemblyAI Universal Streaming API, including format_turns, end_of_turn_confidence_threshold, min_end_of_turn_silence_when_confident, max_turn_silence, and keyterms_prompt. See the provider's documentation for more information.
In Node.js this parameter is called modelOptions.
String descriptors
As a shortcut, you can also pass a model descriptor string directly to the stt argument in your AgentSession:
from livekit.agents import AgentSessionsession = AgentSession(stt="assemblyai/universal-streaming:en",# ... tts, stt, vad, turn_detection, etc.)
import { AgentSession } from '@livekit/agents';session = new AgentSession({stt: "assemblyai/universal-streaming:en",// ... tts, stt, vad, turn_detection, etc.});
Turn detection
AssemblyAI includes a custom phrase endpointing model that uses both audio and linguistic information to detect turn boundaries. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.
session = AgentSession(turn_detection="stt",stt=inference.STT(model="assemblyai/universal-streaming",language="en"),vad=silero.VAD.load(), # Recommended for responsive interruption handling# ... llm, tts, etc.)
Plugin
Use the AssemblyAI plugin to connect directly to AssemblyAI's API with your own API key.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[assemblyai]~=1.4"
Authentication
The AssemblyAI plugin requires an AssemblyAI API key.
Set ASSEMBLYAI_API_KEY in your .env file.
Usage
Use AssemblyAI STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
from livekit.plugins import assemblyaisession = AgentSession(stt = assemblyai.STT(),# ... vad, llm, tts, etc.)
Parameters
This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.
stringOptionalDefault: universal-streamingSTT model to use. Accepted options are universal-streaming and universal-streaming-multilingual. Use universal-streaming-multilingual for non-English languages.
stringOptionalDefault: enThe language of the audio. For a full list of supported languages, see the Supported languages page.
floatOptionalDefault: 0.4The confidence threshold to use when determining if the end of a turn has been reached.
intOptionalDefault: 400The minimum duration of silence required to detect end of turn when confident.
intOptionalDefault: 1280The maximum duration of silence allowed in a turn before end of turn is triggered.
Turn detection
AssemblyAI includes a custom phrase endpointing model that uses both audio and linguistic information to detect turn boundaries. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.
session = AgentSession(turn_detection="stt",stt=assemblyai.STT(end_of_turn_confidence_threshold=0.4,min_end_of_turn_silence_when_confident=400,max_turn_silence=1280,),vad=silero.VAD.load(), # Recommended for responsive interruption handling# ... llm, tts, etc.)
Additional resources
The following resources provide more information about using AssemblyAI with LiveKit Agents.
Python package
The livekit-plugins-assemblyai package on PyPI.
Plugin reference
Reference for the AssemblyAI STT plugin.
GitHub repo
View the source or contribute to the LiveKit AssemblyAI STT plugin.
AssemblyAI docs
AssemblyAI's full docs for the Universal Streaming API.
Voice AI quickstart
Get started with LiveKit Agents and AssemblyAI.
AssemblyAI LiveKit guide
Guide to using AssemblyAI Universal Streaming STT with LiveKit.