LiveKit docs › Multimodality › Speech & audio › Wakeword detection

---

# Wakeword detection

> Detect a spoken trigger phrase on the client to activate a voice AI agent hands-free.

## Overview

A wakeword is a short spoken phrase, like "Hey Siri," that activates a voice-enabled device or agent. Running detection on the client lets your agent stay idle until the user speaks the trigger phrase. This is a common pattern for hands-free interfaces, edge devices like the Raspberry Pi, and branded activation phrases.

The [livekit-wakeword](https://github.com/livekit/livekit-wakeword) library is an open-source toolkit that includes:

- A pre-trained `hey livekit` classifier you can use out of the box.
- A training pipeline for creating custom wakeword classifiers.
- Three client SDKs (Python, Rust, Swift) for running detection on a device.

Models export as standard ONNX files and are compatible with [openWakeWord](https://github.com/dscripka/openWakeWord). For benchmarks and architecture details, see the [LiveKit blog post](https://livekit.com/blog/livekit-wakeword).

### How it works

Detection runs on the client device, not on the agent server. A typical setup pairs an on-device client with a standard LiveKit Agents server. At runtime:

1. The client listens to the microphone locally and scores each audio frame against the trained classifier.
2. When a score crosses the threshold, the client connects to a LiveKit room.
3. The agent joins the room.
4. When the user finishes, the agent leaves the room.
5. The client disconnects and resumes listening for the wakeword.

The SDK you select on the client is independent of the agent server. Any client SDK can connect to a Python or Node.js LiveKit Agents server.

### Try the example

The [hello-wakeword](https://github.com/livekit-examples/hello-wakeword) example pairs a Python client (using the pre-trained `hey livekit` model) with a LiveKit Agents server. It's the fastest way to see end-to-end detection.

The agent uses [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md) for STT, LLM, and TTS, so no separate provider keys are required.

This example assumes you have:

- A [LiveKit Cloud](https://cloud.livekit.io/) account
- [uv](https://docs.astral.sh/uv/) installed

1. Clone the repo and install both packages:

```shell
git clone https://github.com/livekit-examples/hello-wakeword
cd hello-wakeword
uv sync --all-packages

```
2. Authenticate with LiveKit Cloud and generate a `.env.local` file:

```shell
lk cloud auth
lk app env -w

```
3. Start the agent server in one terminal:

```shell
uv run wakeword-agent dev

```
4. Start the client in another terminal:

```shell
uv run wakeword-client

```
5. Say `hey livekit` to trigger the agent. Stop speaking to end the session, and the client resumes listening automatically.

## Set up wakeword detection

To run wakeword detection on a device, you need an ONNX classifier file and a client SDK to score audio against it. Either classifier (pre-trained or custom-trained) works with any of the three SDKs.

### Use the pre-trained model

Download the pre-trained `hey livekit` classifier:

```shell
curl -LO https://raw.githubusercontent.com/livekit-examples/hello-wakeword/main/client/models/hey_livekit.onnx

```

To detect a different phrase or language, [train a custom wakeword](#train-a-custom-wakeword) instead.

### Train a custom wakeword

To detect a different trigger phrase or a non-English language, train a custom classifier. The training pipeline is automated and uses TTS to generate synthetic training samples, so no audio recording or labeled data is required. The result is a single ONNX file that loads in any client SDK.

Train locally, in the cloud, or on GPU instances via [SkyPilot](https://github.com/skypilot-org/skypilot).

#### Pipeline stages

The training pipeline has six stages. The `run` command chains four of them. You can run any stage on its own with `livekit-wakeword <stage> <config>`.

| Stage | What it does |
| `setup` | Download base data (Piper or VoxCPM weights, ACAV features, room impulse responses, MUSAN background noise). |
| `generate` | Synthesize positive samples and adversarial negatives via TTS. |
| `augment` | Add noise, reverb, and pitch shifts. Extract features through the frozen mel and embedding models. |
| `train` | Train the classifier head on the extracted features. |
| `export` | Export the trained classifier to ONNX. |
| `eval` | Score the exported model against the validation set. Produces a DET curve plot and a metrics JSON file. |

#### Train via the CLI

1. Install the system dependencies: `espeak-ng`, `ffmpeg`, `sox`, `libsndfile`, and `portaudio`.

**macOS**:

```shell
brew install espeak-ng ffmpeg sox portaudio

```

---

**Ubuntu/Debian**:

```shell
sudo apt install espeak-ng ffmpeg sox libsndfile1 portaudio19-dev

```

---

**Windows**:

```powershell
winget install eSpeak-NG.eSpeak-NG
winget install Gyan.FFmpeg
winget install ChrisBagwell.SoX

```

`libsndfile` and `portaudio` are bundled with the `soundfile` and `pyaudio` Python wheels on Windows, so you don't need to install them separately.
2. Install the CLI:

```shell
pip install "livekit-wakeword[train,eval,export]"

```
3. Write a config file. A minimum config looks like this:

```yaml
# hey_robot.yaml
model_name: hey_robot
target_phrases:
  - "hey robot"

n_samples: 10000
model:
  model_type: conv_attention  # conv_attention (default), dnn, or rnn
  model_size: small           # tiny, small, medium, large
steps: 50000
target_fp_per_hour: 0.2

```
4. Download the base data:

```shell
livekit-wakeword setup --config hey_robot.yaml

```
5. Run the training pipeline:

```shell
livekit-wakeword run hey_robot.yaml

```
6. (Optional) Evaluate the model against the validation set:

```shell
livekit-wakeword eval hey_robot.yaml

```

You can evaluate any compatible ONNX model using `livekit-wakeword eval` by passing `-m /path/to/other_model.onnx`.

#### Train via the Python API

Drive the same pipeline from code when you need to integrate training into a larger system or automate model iteration:

```python
from livekit.wakeword import (
    WakeWordConfig,
    load_config,
    run_generate,
    run_augment,
    run_extraction,
    run_train,
    run_export,
    run_eval,
)

# Load from YAML
config = load_config("hey_robot.yaml")

# Or build a config programmatically
config = WakeWordConfig(
    model_name="hey_robot",
    target_phrases=["hey robot"],
    n_samples=5000,
    steps=30000,
)

run_generate(config)
run_augment(config)
run_extraction(config)
run_train(config)
onnx_path = run_export(config)

results = run_eval(config, onnx_path)
print(results)

```

#### Multilingual support

By default, training generates English samples with [Piper TTS](https://github.com/rhasspy/piper). To train in a different language, switch the TTS backend to [VoxCPM](https://github.com/OpenBMB/VoxCPM), which supports 30 languages.

1. Install the `voxcpm` extra alongside the training extras:

```shell
pip install "livekit-wakeword[train,eval,export,voxcpm]"

```
2. Set the backend in your config:

```yaml
# ni_hao_livekit.yaml
model_name: ni_hao_livekit
target_phrases:
  - "你好 livekit"
tts_backend: voxcpm

```

> 🔥 **Caution**
> 
> Multilingual accuracy is currently lower than English. To improve results, increase `voice_design_prompts` (50 to 100) and `n_samples` in your config.

### Select a client SDK

The library provides three client SDKs. Select the one that fits the platform you're targeting:

- **Python**: for Linux, macOS, or Windows clients. Includes a built-in microphone listener.
- **Rust**: for native or embedded clients. Inference only.
- **Swift**: for iOS 16+ and macOS 14+ apps. Includes a built-in microphone listener with CoreML acceleration.

Each tab below shows install + load + use steps for that SDK. Any SDK works with either classifier.

**Python**:

1. Install the library from PyPI. Add the `listener` extra to use the built-in microphone listener, which depends on PortAudio:

```shell
# macOS
brew install portaudio

# Ubuntu/Debian
sudo apt install portaudio19-dev

```

```shell
pip install "livekit-wakeword[listener]"

```

Python 3.11 or later is required. Runtime dependencies are `numpy` and `onnxruntime`.
2. Load a model and score audio frames:

```python
from livekit.wakeword import WakeWordModel

model = WakeWordModel(models=["hey_livekit.onnx"])

# Feed audio frames (16 kHz, int16 or float32)
scores = model.predict(audio_frame)
if scores["hey_livekit"] > 0.5:
    print("Wakeword detected!")

```
3. (Alternative) For hands-free use, wrap the model with `WakeWordListener`. `wait_for_detection` blocks until a score crosses the threshold:

```python
import asyncio
from livekit.wakeword import WakeWordModel, WakeWordListener

model = WakeWordModel(models=["hey_livekit.onnx"])

async def main():
    async with WakeWordListener(model, threshold=0.5, debounce=2.0) as listener:
        while True:
            detection = await listener.wait_for_detection()
            print(f"Detected {detection.name} ({detection.confidence:.2f})")

asyncio.run(main())

```

`threshold` is the minimum score (0 to 1) to count as a detection. Lower values are more sensitive but produce more false positives. `debounce` is the minimum interval, in seconds, between consecutive detections.

For a complete Python example wired up to a LiveKit Agents server, see [hello-wakeword](https://github.com/livekit-examples/hello-wakeword).

---

**Rust**:

The Rust crate is inference only, meaning it only handles wakeword detection. You need to manage audio capture yourself. Use your preferred audio library (such as [`cpal`](https://github.com/RustAudio/cpal)) to capture microphone audio and pass `i16` PCM frames to `predict()`.

1. Add the [livekit-wakeword](https://crates.io/crates/livekit-wakeword) crate to your project:

```shell
cargo add livekit-wakeword

```
2. Load a model and score `i16` PCM audio chunks at the configured sample rate:

```rust
use livekit_wakeword::WakeWordModel;

let mut model = WakeWordModel::new(&["hey_livekit.onnx"], 16000)?;

let scores = model.predict(&audio_chunk)?;
if scores["hey_livekit"] > 0.5 {
    println!("Wakeword detected!");
}

```

Input audio at sample rates between 16 kHz and 384 kHz is automatically resampled to 16 kHz. The mel spectrogram and embedding models are compiled into the binary, so only the classifier ONNX file is loaded at runtime. The crate uses a pure-Rust ONNX backend by default and falls back to the native ONNX Runtime on aarch64 Windows.

---

**Swift**:

1. Add the [`LiveKitWakeWord`](https://github.com/livekit/livekit-wakeword/tree/main/swift) package to your `Package.swift`:

```swift
.package(url: "https://github.com/livekit/livekit-wakeword", branch: "main"),

```
2. Load a model and score `Int16` PCM chunks at the configured sample rate:

```swift
import LiveKitWakeWord

let classifier = Bundle.main.url(forResource: "hey_livekit", withExtension: "onnx")!
let model = try WakeWordModel(models: [classifier], sampleRate: 16_000)

let scores = try model.predict(audioChunk)
if (scores["hey_livekit"] ?? 0) > 0.5 {
    print("Wakeword detected!")
}

```
3. (Alternative) For hands-free use, wrap the model with `WakeWordListener` and consume detections as an async sequence:

```swift
import LiveKitWakeWord

let classifier = Bundle.main.url(forResource: "hey_livekit", withExtension: "onnx")!
let model = try WakeWordModel(models: [classifier], sampleRate: 16_000)
let listener = WakeWordListener(model: model, threshold: 0.5, debounce: 2.0)

try listener.start()
for await detection in listener.detections() {
    print("Detected \(detection.name) (\(String(format: "%.2f", detection.confidence)))")
}

```

`threshold` is the minimum score (0 to 1) to count as a detection. `debounce` is the minimum interval, in seconds, between consecutive detections. Add `NSMicrophoneUsageDescription` to your `Info.plist` (and `com.apple.security.device.audio-input` on sandboxed macOS apps) before using the listener.

Audio at any sample rate is resampled to 16 kHz internally via `AVAudioConverter`. ONNX Runtime with the CoreML Execution Provider dispatches to ANE, GPU, or CPU by default.

A SwiftUI demo lives in [`examples/ios_wakeword/`](https://github.com/livekit/livekit-wakeword/tree/main/examples/ios_wakeword).

## Additional resources

The following resources provide more information about LiveKit wakeword detection.

- **[livekit-wakeword](https://github.com/livekit/livekit-wakeword)**: Source for the training toolkit, SDKs, and example apps.

- **[hello-wakeword](https://github.com/livekit-examples/hello-wakeword)**: End-to-end example of a wakeword-triggered voice agent.

- **[Python package](https://pypi.org/project/livekit-wakeword/)**: The `livekit-wakeword` package on PyPI.

- **[Rust crate](https://crates.io/crates/livekit-wakeword)**: The `livekit-wakeword` crate for native clients.

- **[Introducing livekit-wakeword](https://livekit.com/blog/livekit-wakeword)**: Blog post covering model architecture, training pipeline, and benchmarks.

---

This document was rendered at 2026-06-07T11:35:49.094Z.
For the latest version of this document, see [https://docs.livekit.io/agents/multimodality/audio/wakeword.md](https://docs.livekit.io/agents/multimodality/audio/wakeword.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).