Skip to main content

xAI STT

How to use xAI STT with LiveKit Agents.

Use in Agent Builder

Create a new agent in your browser using this model

Overview

xAI speech-to-text is available in LiveKit Agents through LiveKit Inference and the xAI plugin. With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you want to manage your own billing and rate limits. Pricing for LiveKit Inference is available on the pricing page.

LiveKit Inference

Use LiveKit Inference to access xAI STT without a separate xAI API key.

Model nameModel IDLanguages
Speech to Text
xai/stt-1
or xai/stt or xai/grok-stt
enarcsdanlfrdehiiditjakomsfaplptroruessvthtrvifilmk

Usage

To use xAI, use the STT class from the inference module:

from livekit.agents import AgentSession, inference
session = AgentSession(
stt=inference.STT(
model="xai/stt-1",
language="en"
),
# ... llm, tts, vad, turn_handling, etc.
)
import { AgentSession, inference } from '@livekit/agents';
const session = new AgentSession({
stt: new inference.STT({
model: "xai/stt-1",
language: "en"
}),
// ... llm, tts, vad, turnHandling, etc.
});

Parameters

model
Required
string

The model to use for the STT. See model IDs for available models.

languageLanguageCode

Language code for the transcription. If not set, the provider default applies.

extra_kwargsdict

Additional parameters to pass to the xAI STT API. See model parameters for supported fields.

In Node.js this parameter is called modelOptions.

Model parameters

Pass the following parameters inside extra_kwargs (Python) or modelOptions (Node.js):

ParameterTypeDefaultNotes
diarizeboolFalseSet to True to enable speaker diarization.
endpointingintDuration of silence in milliseconds before an utterance is considered final. Valid range: 05000.
formatboolWhether to apply inverse text normalization (for example, "one hundred dollars""$100"). Requires language to be set.
interim_resultsboolTrueWhether to return in-progress transcription results before the final transcript. Disable to reduce the number of messages at the cost of higher latency.

String descriptors

As a shortcut, you can also pass a model ID string directly to the stt argument in your AgentSession:

from livekit.agents import AgentSession
session = AgentSession(
stt="xai/stt-1:en",
# ... llm, tts, vad, turn_handling, etc.
)
import { AgentSession } from '@livekit/agents';
const session = new AgentSession({
stt: "xai/stt-1:en",
// ... llm, tts, vad, turnHandling, etc.
});

Plugin

LiveKit's plugin support for xAI lets you connect directly to xAI's STT API with your own API key.

Available in
Python
|
Node.js

Installation

Install the plugin from PyPI or npm:

uv add "livekit-agents[xai]~=1.5"
pnpm add @livekit/agents-plugin-xai

Authentication

The xAI plugin requires an xAI API key.

Set XAI_API_KEY in your .env file.

Usage

Use xAI STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import xai
session = AgentSession(
stt=xai.STT(
language="en",
),
# ... llm, tts, etc.
)
import * as xai from '@livekit/agents-plugin-xai';
const session = new voice.AgentSession({
stt: new xai.STT({
language: "en",
}),
// ... llm, tts, etc.
});

Parameters

This section describes some of the available parameters. See the plugin reference links in the Additional resources section for a complete list of all available parameters.

api_key
Required
stringEnv: XAI_API_KEY

xAI API key.

languagestringDefault: en

BCP-47 language code for transcription (for example, "en", "fr", "de").

enable_interim_resultsboolDefault: True

Whether to return in-progress transcription results before the final transcript. Disable to reduce the number of messages at the cost of higher latency.

In Node.js this parameter is called interimResults.

enable_diarizationboolDefault: False

Set to True to enable speaker diarization.

In Node.js this parameter is called enableDiarization.

Speaker diarization

Enable speaker diarization so the STT assigns a speaker identifier to each word or segment. When enabled, transcript events include a speaker_id, and the STT reports capabilities.diarization = True.

With diarization enabled, you can wrap the xAI STT with MultiSpeakerAdapter for primary speaker detection and transcript formatting.

Enable speaker diarization in the STT constructor:

stt = inference.STT(
model="xai/stt-1",
extra_kwargs={
"diarize": True,
},
)
stt = xai.STT(
enable_diarization=True,
)

Additional resources

The following resources provide more information about using xAI with LiveKit Agents.