Skip to main content

Wakeword detection

Detect a spoken trigger phrase on the client to activate a voice AI agent hands-free.

Overview

A wakeword is a short spoken phrase, like "Hey Siri," that activates a voice-enabled device or agent. Running detection on the client lets your agent stay idle until the user speaks the trigger phrase. This is a common pattern for hands-free interfaces, edge devices like the Raspberry Pi, and branded activation phrases.

The livekit-wakeword  library is an open-source toolkit that includes:

  • A pre-trained hey livekit classifier you can use out of the box.
  • A training pipeline for creating custom wakeword classifiers.
  • Three client SDKs (Python, Rust, Swift) for running detection on a device.

Models export as standard ONNX files and are compatible with openWakeWord . For benchmarks and architecture details, see the LiveKit blog post .

How it works

Detection runs on the client device, not on the agent server. A typical setup pairs an on-device client with a standard LiveKit Agents server. At runtime:

  1. The client listens to the microphone locally and scores each audio frame against the trained classifier.
  2. When a score crosses the threshold, the client connects to a LiveKit room.
  3. The agent joins the room.
  4. When the user finishes, the agent leaves the room.
  5. The client disconnects and resumes listening for the wakeword.

The SDK you select on the client is independent of the agent server. Any client SDK can connect to a Python or Node.js LiveKit Agents server.

Try the example

The hello-wakeword  example pairs a Python client (using the pre-trained hey livekit model) with a LiveKit Agents server. It's the fastest way to see end-to-end detection.

The agent uses LiveKit Inference for STT, LLM, and TTS, so no separate provider keys are required.

This example assumes you have:

  1. Clone the repo and install both packages:

    git clone https://github.com/livekit-examples/hello-wakeword
    cd hello-wakeword
    uv sync --all-packages
  2. Authenticate with LiveKit Cloud and generate a .env.local file:

    lk cloud auth
    lk app env -w
  3. Start the agent server in one terminal:

    uv run wakeword-agent dev
  4. Start the client in another terminal:

    uv run wakeword-client
  5. Say hey livekit to trigger the agent. Stop speaking to end the session, and the client resumes listening automatically.

Set up wakeword detection

To run wakeword detection on a device, you need an ONNX classifier file and a client SDK to score audio against it. Either classifier (pre-trained or custom-trained) works with any of the three SDKs.

Use the pre-trained model

Download the pre-trained hey livekit classifier:

curl -LO https://raw.githubusercontent.com/livekit-examples/hello-wakeword/main/client/models/hey_livekit.onnx

To detect a different phrase or language, train a custom wakeword instead.

Train a custom wakeword

To detect a different trigger phrase or a non-English language, train a custom classifier. The training pipeline is automated and uses TTS to generate synthetic training samples, so no audio recording or labeled data is required. The result is a single ONNX file that loads in any client SDK.

Train locally, in the cloud, or on GPU instances via SkyPilot .

Pipeline stages

The training pipeline has six stages. The run command chains four of them. You can run any stage on its own with livekit-wakeword <stage> <config>.

StageWhat it does
setupDownload base data (Piper or VoxCPM weights, ACAV features, room impulse responses, MUSAN background noise).
generateSynthesize positive samples and adversarial negatives via TTS.
augmentAdd noise, reverb, and pitch shifts. Extract features through the frozen mel and embedding models.
trainTrain the classifier head on the extracted features.
exportExport the trained classifier to ONNX.
evalScore the exported model against the validation set. Produces a DET curve plot and a metrics JSON file.

Train via the CLI

  1. Install the system dependencies: espeak-ng, ffmpeg, sox, libsndfile, and portaudio.

    brew install espeak-ng ffmpeg sox portaudio
    sudo apt install espeak-ng ffmpeg sox libsndfile1 portaudio19-dev
    winget install eSpeak-NG.eSpeak-NG
    winget install Gyan.FFmpeg
    winget install ChrisBagwell.SoX

    libsndfile and portaudio are bundled with the soundfile and pyaudio Python wheels on Windows, so you don't need to install them separately.

  2. Install the CLI:

    pip install "livekit-wakeword[train,eval,export]"
  3. Write a config file. A minimum config looks like this:

    # hey_robot.yaml
    model_name: hey_robot
    target_phrases:
    - "hey robot"
    n_samples: 10000
    model:
    model_type: conv_attention # conv_attention (default), dnn, or rnn
    model_size: small # tiny, small, medium, large
    steps: 50000
    target_fp_per_hour: 0.2
  4. Download the base data:

    livekit-wakeword setup --config hey_robot.yaml
  5. Run the training pipeline:

    livekit-wakeword run hey_robot.yaml
  6. (Optional) Evaluate the model against the validation set:

    livekit-wakeword eval hey_robot.yaml

    You can evaluate any compatible ONNX model using livekit-wakeword eval by passing -m /path/to/other_model.onnx.

Train via the Python API

Drive the same pipeline from code when you need to integrate training into a larger system or automate model iteration:

from livekit.wakeword import (
WakeWordConfig,
load_config,
run_generate,
run_augment,
run_extraction,
run_train,
run_export,
run_eval,
)
# Load from YAML
config = load_config("hey_robot.yaml")
# Or build a config programmatically
config = WakeWordConfig(
model_name="hey_robot",
target_phrases=["hey robot"],
n_samples=5000,
steps=30000,
)
run_generate(config)
run_augment(config)
run_extraction(config)
run_train(config)
onnx_path = run_export(config)
results = run_eval(config, onnx_path)
print(results)

Multilingual support

By default, training generates English samples with Piper TTS . To train in a different language, switch the TTS backend to VoxCPM , which supports 30 languages.

  1. Install the voxcpm extra alongside the training extras:

    pip install "livekit-wakeword[train,eval,export,voxcpm]"
  2. Set the backend in your config:

    # ni_hao_livekit.yaml
    model_name: ni_hao_livekit
    target_phrases:
    - "你好 livekit"
    tts_backend: voxcpm
Caution

Multilingual accuracy is currently lower than English. To improve results, increase voice_design_prompts (50 to 100) and n_samples in your config.

Select a client SDK

The library provides three client SDKs. Select the one that fits the platform you're targeting:

  • Python: for Linux, macOS, or Windows clients. Includes a built-in microphone listener.
  • Rust: for native or embedded clients. Inference only.
  • Swift: for iOS 16+ and macOS 14+ apps. Includes a built-in microphone listener with CoreML acceleration.

Each tab below shows install + load + use steps for that SDK. Any SDK works with either classifier.

  1. Install the library from PyPI. Add the listener extra to use the built-in microphone listener, which depends on PortAudio:

    # macOS
    brew install portaudio
    # Ubuntu/Debian
    sudo apt install portaudio19-dev
    pip install "livekit-wakeword[listener]"

    Python 3.11 or later is required. Runtime dependencies are numpy and onnxruntime.

  2. Load a model and score audio frames:

    from livekit.wakeword import WakeWordModel
    model = WakeWordModel(models=["hey_livekit.onnx"])
    # Feed audio frames (16 kHz, int16 or float32)
    scores = model.predict(audio_frame)
    if scores["hey_livekit"] > 0.5:
    print("Wakeword detected!")
  3. (Alternative) For hands-free use, wrap the model with WakeWordListener. wait_for_detection blocks until a score crosses the threshold:

    import asyncio
    from livekit.wakeword import WakeWordModel, WakeWordListener
    model = WakeWordModel(models=["hey_livekit.onnx"])
    async def main():
    async with WakeWordListener(model, threshold=0.5, debounce=2.0) as listener:
    while True:
    detection = await listener.wait_for_detection()
    print(f"Detected {detection.name} ({detection.confidence:.2f})")
    asyncio.run(main())

    threshold is the minimum score (0 to 1) to count as a detection. Lower values are more sensitive but produce more false positives. debounce is the minimum interval, in seconds, between consecutive detections.

For a complete Python example wired up to a LiveKit Agents server, see hello-wakeword .

The Rust crate is inference only, meaning it only handles wakeword detection. You need to manage audio capture yourself. Use your preferred audio library (such as cpal ) to capture microphone audio and pass i16 PCM frames to predict().

  1. Add the livekit-wakeword  crate to your project:

    cargo add livekit-wakeword
  2. Load a model and score i16 PCM audio chunks at the configured sample rate:

    use livekit_wakeword::WakeWordModel;
    let mut model = WakeWordModel::new(&["hey_livekit.onnx"], 16000)?;
    let scores = model.predict(&audio_chunk)?;
    if scores["hey_livekit"] > 0.5 {
    println!("Wakeword detected!");
    }

    Input audio at sample rates between 16 kHz and 384 kHz is automatically resampled to 16 kHz. The mel spectrogram and embedding models are compiled into the binary, so only the classifier ONNX file is loaded at runtime. The crate uses a pure-Rust ONNX backend by default and falls back to the native ONNX Runtime on aarch64 Windows.

  1. Add the LiveKitWakeWord  package to your Package.swift:

    .package(url: "https://github.com/livekit/livekit-wakeword", branch: "main"),
  2. Load a model and score Int16 PCM chunks at the configured sample rate:

    import LiveKitWakeWord
    let classifier = Bundle.main.url(forResource: "hey_livekit", withExtension: "onnx")!
    let model = try WakeWordModel(models: [classifier], sampleRate: 16_000)
    let scores = try model.predict(audioChunk)
    if (scores["hey_livekit"] ?? 0) > 0.5 {
    print("Wakeword detected!")
    }
  3. (Alternative) For hands-free use, wrap the model with WakeWordListener and consume detections as an async sequence:

    import LiveKitWakeWord
    let classifier = Bundle.main.url(forResource: "hey_livekit", withExtension: "onnx")!
    let model = try WakeWordModel(models: [classifier], sampleRate: 16_000)
    let listener = WakeWordListener(model: model, threshold: 0.5, debounce: 2.0)
    try listener.start()
    for await detection in listener.detections() {
    print("Detected \(detection.name) (\(String(format: "%.2f", detection.confidence)))")
    }

    threshold is the minimum score (0 to 1) to count as a detection. debounce is the minimum interval, in seconds, between consecutive detections. Add NSMicrophoneUsageDescription to your Info.plist (and com.apple.security.device.audio-input on sandboxed macOS apps) before using the listener.

Audio at any sample rate is resampled to 16 kHz internally via AVAudioConverter. ONNX Runtime with the CoreML Execution Provider dispatches to ANE, GPU, or CPU by default.

A SwiftUI demo lives in examples/ios_wakeword/ .

Additional resources

The following resources provide more information about LiveKit wakeword detection.