Overview
This plugin allows you to use Inworld as an STT provider for your voice agents.
Installation
Install the plugin:
uv add "livekit-agents[inworld]~=1.5"
pnpm add @livekit/agents-plugin-inworld@1.x
Authentication
The Inworld plugin requires a Base64-encoded Inworld API key .
Set INWORLD_API_KEY in your .env file.
Usage
Use Inworld STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
from livekit.agents import AgentSessionfrom livekit.plugins import inworldsession = AgentSession(stt=inworld.STT(model="inworld/inworld-stt-1",language="en-US",),# ... llm, tts, etc.)
import { voice } from '@livekit/agents';import * as inworld from '@livekit/agents-plugin-inworld';const session = new voice.AgentSession({stt: new inworld.STT({model: "inworld/inworld-stt-1",language: "en-US",}),// ... llm, tts, etc.});
Parameters
This section describes commonly used parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.
modelstringDefault: inworld/inworld-stt-1The Inworld STT model to use. Inworld serves several models through the same API, including inworld/inworld-stt-1, assemblyai/universal-streaming-multilingual, and soniox/stt-rt-v4. See the Inworld STT docs for the current list of supported models.
languageLanguageCodeDefault: en-USLanguage code for the input audio. See the Inworld STT docs for supported languages.
sample_rateintegerDefault: 16000Input audio sample rate in Hz.
num_channelsintegerDefault: 1Number of audio channels in the input stream.
enable_voice_profilebooleanDefault: trueEnables voice profiling, which detects speaker characteristics such as age, gender, emotion, and accent on each transcript.
voice_profile_top_nintegerDefault: 1Number of top voice profile results to return per category when enable_voice_profile is set.
vad_thresholdfloatVoice activity detection sensitivity. If unset, Inworld applies its own default.
min_end_of_turn_silence_when_confidentintegerDefault: 200Minimum silence, in milliseconds, required to end a turn when the model is confident the speaker has finished.
end_of_turn_confidence_thresholdfloatDefault: 0.3Confidence threshold used to decide when a turn has ended.
Valid range: 0.0–1.0
Voice profile
When voice profiling is enabled (the default), each transcript exposes the detected voice profile on the metadata field, with attributes such as age, emotion, pitch, vocal style, accent, and gender. In Python, read it from metadata["voice_profile"]. In Node.js, read it from metadata.voiceProfile. Use voice_profile_top_n to control how many results are returned per category, or disable it with the enable_voice_profile parameter.
For an example of reading metadata from transcript events, see Provider-specific metadata on the STT overview.
Additional resources
The following resources provide more information about using Inworld with LiveKit Agents.