Skip to main content

xAI Grok Voice Agent plugin

How to use the xAI's Grok Voice Agent API with LiveKit Agents.

Available in
Python

Overview

The Grok Voice Agent API enables low-latency, two-way voice interactions using Grok models. LiveKit's xAI plugin includes a RealtimeModel class that allows you to create agents with natural, human-like voice conversations.

Quick reference

This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.

Installation

Install the xAI plugin:

uv add "livekit-agents[xai]"

Authentication

The xAI plugin requires an xAI API key.

Set XAI_API_KEY in your .env file.

Usage

Use the Grok Voice Agent API within an AgentSession. For example, you can use it in the Voice AI quickstart.

from livekit.agents import AgentSession
from livekit.plugins import xai
session = AgentSession(
llm=xai.realtime.RealtimeModel(),
)

Parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference links in the Additional resources section.

voicestrOptionalDefault: 'ara'

Voice to use for speech generation. For a list of available voices, see Available voices.

api_keystrRequiredEnv: XAI_API_KEY

xAI API key.

turn_detectionTurnDetection | NoneOptional

Configuration for turn detection. Server VAD is enabled by default with the following settings: threshold=0.5, prefix_padding_ms=300, silence_duration_ms=200.

Turn detection

The Grok Voice Agent API includes built-in VAD-based turn detection, enabled by default with optimized settings:

from livekit.agents import AgentSession
from livekit.plugins import xai
from openai.types.beta.realtime.session import TurnDetection
session = AgentSession(
llm=xai.RealtimeModel(
turn_detection=TurnDetection(
type="server_vad",
threshold=0.5,
prefix_padding_ms=300,
silence_duration_ms=200,
create_response=True,
interrupt_response=True,
)
),
)
  • threshold — higher values require louder audio to activate, better for noisy environments.
  • prefix_padding_ms — amount of audio to include before detected speech.
  • silence_duration_ms — duration of silence to detect speech stop (shorter = faster turn detection).

Additional resources

The following resources provide more information about using xAI with LiveKit Agents.