Overview
This plugin allows you to use Amazon Transcribe as an STT provider for your voice agents.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[aws]~=1.5"
Authentication
The Amazon Transcribe plugin requires an AWS API key .
Set the following environment variables in your .env file:
AWS_ACCESS_KEY_ID=<aws-access-key-id>AWS_SECRET_ACCESS_KEY=<aws-secret-access-key>AWS_DEFAULT_REGION=<aws-deployment-region>
Usage
Use Amazon Transcribe STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
from livekit.plugins import awssession = AgentSession(stt = aws.STT(session_id="my-session-id",language="en-US",vocabulary_name="my-vocabulary",vocab_filter_name="my-vocab-filter",vocab_filter_method="mask",),# ... llm, tts, etc.)
Multilingual
For multilingual scenarios, the plugin can auto-detect the language at the start of the stream or detect switches between languages mid-stream. Pick the simplest mode that fits your use case. Auto-detection is more flexible than a fixed language but less reliable, especially with short utterances or specialized vocabulary.
Single-language auto-detection
Detects the language once at the start of the session and uses it for all transcripts. Use this when the language is unknown at the start of the session but stable for its duration. Provide between 2 and 12 candidate language codes:
from livekit.plugins import awssession = AgentSession(stt = aws.STT(identify_language=True,language_options="en-US,es-US,fr-FR",),# ... llm, tts, etc.)
Mid-stream language switching
Detects language switches per segment of speech. Use this when speakers may switch languages mid-conversation, like multilingual customer support. Each final transcript reports the detected language:
from livekit.plugins import awssession = AgentSession(stt = aws.STT(identify_multiple_languages=True,language_options="en-US,es-US,fr-FR,de-DE,ja-JP,ko-KR,zh-HK,pt-BR,hi-IN,vi-VN,pl-PL,ru-RU",),# ... llm, tts, etc.)
Multilingual turn detection
Pairs mid-stream switching with the multilingual turn detector so end-of-turn predictions adapt to the language of each segment.
from livekit.plugins import awsfrom livekit.plugins.turn_detector.multilingual import MultilingualModelsession = AgentSession(stt = aws.STT(identify_multiple_languages=True,language_options="en-US,es-US,fr-FR",),turn_detection=MultilingualModel(),# ... llm, tts, etc.)
Parameters
This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.
speech_regionstringDefault: us-east-1Env: AWS_DEFAULT_REGIONThe region of the AWS deployment. Required if the environment variable isn't set.
languageLanguageCodeDefault: en-USLanguage code for the audio. Ignored when identify_language or identify_multiple_languages is enabled. For a full list of supported languages, see the Supported languages page.
identify_languageboolDefault: FalseEnable automatic detection of a single dominant language for the stream. Provide candidate languages with language_options. Cannot be combined with identify_multiple_languages.
identify_multiple_languagesboolDefault: FalseDetect language switches mid-stream so the agent can transcribe each segment in the spoken language. Provide candidate languages with language_options. Cannot be combined with identify_language.
language_optionsstringComma-separated list of expected language codes (between 2 and 12 codes). Required when identify_language or identify_multiple_languages is enabled.
preferred_languagestringA language code from language_options that biases detection toward this language. Only applies when identify_language or identify_multiple_languages is enabled.
vocabulary_namesstringComma-separated list of per-language custom vocabulary names that map to the codes in language_options. Use when different candidate languages have different vocabularies. Only applies when identify_language or identify_multiple_languages is enabled.
vocabulary_filter_namesstringComma-separated list of per-language custom vocabulary filter names that map to the codes in language_options. Only applies when identify_language or identify_multiple_languages is enabled.
vocabulary_namestringDefault: NoneName of the custom vocabulary you want to use when processing your transcription. To learn more, see Custom vocabularies .
session_idstringName for your transcription session. If left empty, Amazon Transcribe generates an ID and returns it in the response.
vocab_filter_namestringDefault: NoneName of the custom vocabulary filter that you want to use when processing your transcription. To learn more, see Using custom vocabulary filters to delete, mask, or flag words .
vocab_filter_methodstringDefault: NoneDisplay method for the vocabulary filter. To learn more, see Using custom vocabulary filters to delete, mask, or flag words .
Additional resources
The following resources provide more information about using Amazon Transcribe with LiveKit Agents.
Python package
The livekit-plugins-aws package on PyPI.
Plugin reference
Reference for the Amazon Transcribe STT plugin.
GitHub repo
View the source or contribute to the LiveKit Amazon Transcribe STT plugin.
AWS docs
Amazon Transcribe's full docs site.
Voice AI quickstart
Get started with LiveKit Agents and Amazon Transcribe.