Inworld STT plugin guide | LiveKit Documentation

Available inPython

Node.js

Overview

This plugin allows you to use Inworld as an STT provider for your voice agents.

Installation

Install the plugin:

uv add "livekit-agents[inworld]~=1.5"

pnpm add @livekit/agents-plugin-inworld@1.x

Authentication

The Inworld plugin requires a Base64-encoded Inworld API key .

Set INWORLD_API_KEY in your .env file.

Usage

Use Inworld STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.agents import AgentSession
from livekit.plugins import inworld

session = AgentSession(
   stt=inworld.STT(
      model="inworld/inworld-stt-1",
      language="en-US",
   ),
   # ... llm, tts, etc.
)

import { voice } from '@livekit/agents';
import * as inworld from '@livekit/agents-plugin-inworld';

const session = new voice.AgentSession({
    stt: new inworld.STT({
        model: "inworld/inworld-stt-1",
        language: "en-US",
    }),
    // ... llm, tts, etc.
});

Parameters

This section describes commonly used parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.

modelstringDefault: inworld/inworld-stt-1

The Inworld STT model to use. Inworld serves several models through the same API, including inworld/inworld-stt-1, assemblyai/universal-streaming-multilingual, and soniox/stt-rt-v4. See the Inworld STT docs for the current list of supported models.

languageLanguageCodeDefault: en-US

Language code for the input audio. See the Inworld STT docs for supported languages.

sample_rateintegerDefault: 16000

Input audio sample rate in Hz.

num_channelsintegerDefault: 1

Number of audio channels in the input stream.

enable_voice_profilebooleanDefault: true

Enables voice profiling, which detects speaker characteristics such as age, gender, emotion, and accent on each transcript.

voice_profile_top_nintegerDefault: 1

Number of top voice profile results to return per category when enable_voice_profile is set.

vad_thresholdfloat

Voice activity detection sensitivity. If unset, Inworld applies its own default.

min_end_of_turn_silence_when_confidentintegerDefault: 200

Minimum silence, in milliseconds, required to end a turn when the model is confident the speaker has finished.

end_of_turn_confidence_thresholdfloatDefault: 0.3

Confidence threshold used to decide when a turn has ended.

Valid range: 0.0–1.0

Voice profile

When voice profiling is enabled (the default), each transcript exposes the detected voice profile on the metadata field, with attributes such as age, emotion, pitch, vocal style, accent, and gender. In Python, read it from metadata["voice_profile"]. In Node.js, read it from metadata.voiceProfile. Use voice_profile_top_n to control how many results are returned per category, or disable it with the enable_voice_profile parameter.

For an example of reading metadata from transcript events, see Provider-specific metadata on the STT overview.

Additional resources

The following resources provide more information about using Inworld with LiveKit Agents.

Python

Reference GitHub PyPI

Node.js

Reference GitHub NPM

Inworld STT docs

Inworld's speech-to-text documentation.

Voice AI quickstart

Get started with LiveKit Agents and Inworld STT.

Inworld TTS

Guide to the Inworld TTS plugin with LiveKit Agents.

Overview

Installation

Authentication

Usage

Parameters

Voice profile

Additional resources

Python

Node.js

Inworld STT docs

Voice AI quickstart

Inworld TTS

Ask LiveKit