Skip to main content

Inworld TTS

How to use Inworld TTS with LiveKit Agents.

Use in Agent Builder

Create a new agent in your browser using this model

Overview

Inworld text-to-speech is available in LiveKit Agents through LiveKit Inference and the Inworld plugin. Pricing for LiveKit Inference is available on the pricing page.

Model IDLanguages
inworld/inworld-tts-1.5-max
enesfrkonlzhdeitjaplptruhi
inworld/inworld-tts-1.5-mini
enesfrkonlzhdeitjaplptruhi
inworld/inworld-tts-1-max
enesfrkonlzhdeitjaplptru
inworld/inworld-tts-1
enesfrkonlzhdeitjaplptru

LiveKit Inference

Use LiveKit Inference to access Inworld TTS without a separate Inworld API key.

Usage

To use Inworld, pass a descriptor with the model and voice to the tts argument in your AgentSession:

from livekit.agents import AgentSession
session = AgentSession(
tts="inworld/inworld-tts-1.5-max:Ashley",
# ... llm, stt, vad, turn_detection, etc.
)
import { AgentSession } from '@livekit/agents';
session = new AgentSession({
tts: "inworld/inworld-tts-1.5-max:Ashley",
// ... llm, stt, vad, turn_detection, etc.
});

Parameters

To customize additional parameters, use the TTS class from the inference module:

from livekit.agents import AgentSession, inference
session = AgentSession(
tts=inference.TTS(
model="inworld/inworld-tts-1.5-max",
voice="Ashley",
language="en"
),
# ... llm, stt, vad, turn_detection, etc.
)
import { AgentSession } from '@livekit/agents';
session = new AgentSession({
tts: new inference.TTS({
model: "inworld/inworld-tts-1.5-max",
voice: "Ashley",
language: "en"
}),
// ... llm, stt, vad, turn_detection, etc.
});
modelstringRequired

The model ID from the models list.

voicestringRequired

See voices for guidance on selecting a voice.

languagestringOptional

Language code for the input text. If not set, the model default applies.

extra_kwargsdictOptional

Additional parameters to pass to the Inworld TTS API. See the provider's documentation for more information.

In Node.js this parameter is called modelOptions.

Voices

LiveKit Inference supports all of the default voices available in the Inworld API. You can explore the available voices in the Inworld TTS Playground (free account required), and use the voice by copying its name into your LiveKit agent session.

Cloned voices unavailable

Cloned voices are not yet supported in LiveKit Inference. To use these voices, create your own Inworld account and use the Inworld plugin for LiveKit Agents instead.

The following is a small sample of the Inworld voices available in LiveKit Inference.

Ashley

Warm, natural American female

🇺🇸
Diego

Soothing, gentle Mexican male

🇲🇽
Edward

Fast-talking, emphatic American male

🇺🇸
Olivia

Upbeat, friendly British female

🇬🇧

Customizing pronunciation

Inworld TTS supports customizing pronunciation. To learn more, see Custom Pronunciation guide.

Plugin

Use the Inworld plugin to connect directly to Inworld's TTS API with your own API key.

Available in
Python
|
Node.js

Installation

Install the plugin from PyPI:

uv add "livekit-agents[inworld]~=1.4"
pnpm add @livekit/agents-plugin-inworld@1.x

Authentication

The Inworld plugin requires Base64 Inworld API key.

Set INWORLD_API_KEY in your .env file.

Usage

Use Inworld TTS within an AgentSession or as a standalone speech generator. For example, you can use this TTS in the Voice AI quickstart.

from livekit.plugins import inworld
session = AgentSession(
tts=inworld.TTS(model="inworld-tts-1.5-max", voice="Ashley")
# ... llm, stt, etc.
)
import * as inworld from '@livekit/agents-plugin-inworld';
const session = new voice.AgentSession({
tts: inworld.TTS(
model: "inworld-tts-1.5-max",
voice: "Ashley",
),
// ... llm, stt, etc.
});

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

modelstringOptionalDefault: "inworld-tts-1.5-max"

ID of the model to use for generation. See supported models.

voicestringOptionalDefault: "Ashley"

ID of the voice to use for generation. Use the List voices API endpoint for possible values.

temperaturefloatOptionalDefault: 1.1

Controls randomness in the output. Recommended to set between 0.6 and 1.1. See docs.

speaking_ratefloatOptionalDefault: 1.0

Controls how fast the voice speaks. 1.0 is the normal native speed, while 0.5 is half the normal speed and 1.5 is 1.5x faster than the normal speed. See docs.

text_normalizationstringOptionalDefault: ON

Controls text normalization. When "ON", numbers, dates, and abbreviations are expanded (e.g., "Dr." -> "Doctor"). When "OFF", text is read exactly as written. See docs.

Additional resources

The following resources provide more information about using Inworld with LiveKit Agents.