Skip to main content

Amazon Transcribe plugin guide

How to use the Amazon Transcribe STT plugin for LiveKit Agents.

Available inPython

Overview

This plugin allows you to use Amazon Transcribe  as an STT provider for your voice agents.

Installation

Install the plugin from PyPI:

uv add "livekit-agents[aws]~=1.5"

Authentication

The Amazon Transcribe plugin requires an AWS API key .

Set the following environment variables in your .env file:

AWS_ACCESS_KEY_ID=<aws-access-key-id>
AWS_SECRET_ACCESS_KEY=<aws-secret-access-key>
AWS_DEFAULT_REGION=<aws-deployment-region>

Usage

Use Amazon Transcribe STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import aws
session = AgentSession(
stt = aws.STT(
session_id="my-session-id",
language="en-US",
vocabulary_name="my-vocabulary",
vocab_filter_name="my-vocab-filter",
vocab_filter_method="mask",
),
# ... llm, tts, etc.
)

Multilingual

For multilingual scenarios, the plugin can auto-detect the language at the start of the stream or detect switches between languages mid-stream. Pick the simplest mode that fits your use case. Auto-detection is more flexible than a fixed language but less reliable, especially with short utterances or specialized vocabulary.

Single-language auto-detection

Detects the language once at the start of the session and uses it for all transcripts. Use this when the language is unknown at the start of the session but stable for its duration. Provide between 2 and 12 candidate language codes:

from livekit.plugins import aws
session = AgentSession(
stt = aws.STT(
identify_language=True,
language_options="en-US,es-US,fr-FR",
),
# ... llm, tts, etc.
)

Mid-stream language switching

Detects language switches per segment of speech. Use this when speakers may switch languages mid-conversation, like multilingual customer support. Each final transcript reports the detected language:

from livekit.plugins import aws
session = AgentSession(
stt = aws.STT(
identify_multiple_languages=True,
language_options="en-US,es-US,fr-FR,de-DE,ja-JP,ko-KR,zh-HK,pt-BR,hi-IN,vi-VN,pl-PL,ru-RU",
),
# ... llm, tts, etc.
)

Multilingual turn detection

Pairs mid-stream switching with the multilingual turn detector so end-of-turn predictions adapt to the language of each segment.

from livekit.plugins import aws
from livekit.plugins.turn_detector.multilingual import MultilingualModel
session = AgentSession(
stt = aws.STT(
identify_multiple_languages=True,
language_options="en-US,es-US,fr-FR",
),
turn_detection=MultilingualModel(),
# ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

speech_regionstringDefault: us-east-1Env: AWS_DEFAULT_REGION

The region of the AWS deployment. Required if the environment variable isn't set.

languageLanguageCodeDefault: en-US

Language code for the audio. Ignored when identify_language or identify_multiple_languages is enabled. For a full list of supported languages, see the Supported languages  page.

identify_languageboolDefault: False

Enable automatic detection of a single dominant language for the stream. Provide candidate languages with language_options. Cannot be combined with identify_multiple_languages.

identify_multiple_languagesboolDefault: False

Detect language switches mid-stream so the agent can transcribe each segment in the spoken language. Provide candidate languages with language_options. Cannot be combined with identify_language.

language_optionsstring

Comma-separated list of expected language codes  (between 2 and 12 codes). Required when identify_language or identify_multiple_languages is enabled.

preferred_languagestring

A language code from language_options that biases detection toward this language. Only applies when identify_language or identify_multiple_languages is enabled.

vocabulary_namesstring

Comma-separated list of per-language custom vocabulary names that map to the codes in language_options. Use when different candidate languages have different vocabularies. Only applies when identify_language or identify_multiple_languages is enabled.

vocabulary_filter_namesstring

Comma-separated list of per-language custom vocabulary filter names that map to the codes in language_options. Only applies when identify_language or identify_multiple_languages is enabled.

vocabulary_namestringDefault: None

Name of the custom vocabulary you want to use when processing your transcription. To learn more, see Custom vocabularies .

session_idstring

Name for your transcription session. If left empty, Amazon Transcribe generates an ID and returns it in the response.

vocab_filter_namestringDefault: None

Name of the custom vocabulary filter that you want to use when processing your transcription. To learn more, see Using custom vocabulary filters to delete, mask, or flag words .

vocab_filter_methodstringDefault: None

Display method for the vocabulary filter. To learn more, see Using custom vocabulary filters to delete, mask, or flag words .

Additional resources

The following resources provide more information about using Amazon Transcribe with LiveKit Agents.

Voice AI quickstart

Get started with LiveKit Agents and Amazon Transcribe.