Overview
A wakeword is a short spoken phrase, like "Hey Siri," that activates a voice-enabled device or agent. Running detection on the client lets your agent stay idle until the user speaks the trigger phrase. This is a common pattern for hands-free interfaces, edge devices like the Raspberry Pi, and branded activation phrases.
The livekit-wakeword library is an open-source toolkit that includes:
- A pre-trained
hey livekitclassifier you can use out of the box. - A training pipeline for creating custom wakeword classifiers.
- Three client SDKs (Python, Rust, Swift) for running detection on a device.
Models export as standard ONNX files and are compatible with openWakeWord . For benchmarks and architecture details, see the LiveKit blog post .
How it works
Detection runs on the client device, not on the agent server. A typical setup pairs an on-device client with a standard LiveKit Agents server. At runtime:
- The client listens to the microphone locally and scores each audio frame against the trained classifier.
- When a score crosses the threshold, the client connects to a LiveKit room.
- The agent joins the room.
- When the user finishes, the agent leaves the room.
- The client disconnects and resumes listening for the wakeword.
The SDK you select on the client is independent of the agent server. Any client SDK can connect to a Python or Node.js LiveKit Agents server.
Try the example
The hello-wakeword example pairs a Python client (using the pre-trained hey livekit model) with a LiveKit Agents server. It's the fastest way to see end-to-end detection.
The agent uses LiveKit Inference for STT, LLM, and TTS, so no separate provider keys are required.
This example assumes you have:
- A LiveKit Cloud account
- uv installed
Clone the repo and install both packages:
git clone https://github.com/livekit-examples/hello-wakewordcd hello-wakeworduv sync --all-packagesAuthenticate with LiveKit Cloud and generate a
.env.localfile:lk cloud authlk app env -wStart the agent server in one terminal:
uv run wakeword-agent devStart the client in another terminal:
uv run wakeword-clientSay
hey livekitto trigger the agent. Stop speaking to end the session, and the client resumes listening automatically.
Set up wakeword detection
To run wakeword detection on a device, you need an ONNX classifier file and a client SDK to score audio against it. Either classifier (pre-trained or custom-trained) works with any of the three SDKs.
Use the pre-trained model
Download the pre-trained hey livekit classifier:
curl -LO https://raw.githubusercontent.com/livekit-examples/hello-wakeword/main/client/models/hey_livekit.onnx
To detect a different phrase or language, train a custom wakeword instead.
Train a custom wakeword
To detect a different trigger phrase or a non-English language, train a custom classifier. The training pipeline is automated and uses TTS to generate synthetic training samples, so no audio recording or labeled data is required. The result is a single ONNX file that loads in any client SDK.
Train locally, in the cloud, or on GPU instances via SkyPilot .
Pipeline stages
The training pipeline has six stages. The run command chains four of them. You can run any stage on its own with livekit-wakeword <stage> <config>.
| Stage | What it does |
|---|---|
setup | Download base data (Piper or VoxCPM weights, ACAV features, room impulse responses, MUSAN background noise). |
generate | Synthesize positive samples and adversarial negatives via TTS. |
augment | Add noise, reverb, and pitch shifts. Extract features through the frozen mel and embedding models. |
train | Train the classifier head on the extracted features. |
export | Export the trained classifier to ONNX. |
eval | Score the exported model against the validation set. Produces a DET curve plot and a metrics JSON file. |
Train via the CLI
Install the system dependencies:
espeak-ng,ffmpeg,sox,libsndfile, andportaudio.brew install espeak-ng ffmpeg sox portaudiosudo apt install espeak-ng ffmpeg sox libsndfile1 portaudio19-devwinget install eSpeak-NG.eSpeak-NGwinget install Gyan.FFmpegwinget install ChrisBagwell.SoXlibsndfileandportaudioare bundled with thesoundfileandpyaudioPython wheels on Windows, so you don't need to install them separately.Install the CLI:
pip install "livekit-wakeword[train,eval,export]"Write a config file. A minimum config looks like this:
# hey_robot.yamlmodel_name: hey_robottarget_phrases:- "hey robot"n_samples: 10000model:model_type: conv_attention # conv_attention (default), dnn, or rnnmodel_size: small # tiny, small, medium, largesteps: 50000target_fp_per_hour: 0.2Download the base data:
livekit-wakeword setup --config hey_robot.yamlRun the training pipeline:
livekit-wakeword run hey_robot.yaml(Optional) Evaluate the model against the validation set:
livekit-wakeword eval hey_robot.yamlYou can evaluate any compatible ONNX model using
livekit-wakeword evalby passing-m /path/to/other_model.onnx.
Train via the Python API
Drive the same pipeline from code when you need to integrate training into a larger system or automate model iteration:
from livekit.wakeword import (WakeWordConfig,load_config,run_generate,run_augment,run_extraction,run_train,run_export,run_eval,)# Load from YAMLconfig = load_config("hey_robot.yaml")# Or build a config programmaticallyconfig = WakeWordConfig(model_name="hey_robot",target_phrases=["hey robot"],n_samples=5000,steps=30000,)run_generate(config)run_augment(config)run_extraction(config)run_train(config)onnx_path = run_export(config)results = run_eval(config, onnx_path)print(results)
Multilingual support
By default, training generates English samples with Piper TTS . To train in a different language, switch the TTS backend to VoxCPM , which supports 30 languages.
Install the
voxcpmextra alongside the training extras:pip install "livekit-wakeword[train,eval,export,voxcpm]"Set the backend in your config:
# ni_hao_livekit.yamlmodel_name: ni_hao_livekittarget_phrases:- "你好 livekit"tts_backend: voxcpm
Multilingual accuracy is currently lower than English. To improve results, increase voice_design_prompts (50 to 100) and n_samples in your config.
Select a client SDK
The library provides three client SDKs. Select the one that fits the platform you're targeting:
- Python: for Linux, macOS, or Windows clients. Includes a built-in microphone listener.
- Rust: for native or embedded clients. Inference only.
- Swift: for iOS 16+ and macOS 14+ apps. Includes a built-in microphone listener with CoreML acceleration.
Each tab below shows install + load + use steps for that SDK. Any SDK works with either classifier.
Install the library from PyPI. Add the
listenerextra to use the built-in microphone listener, which depends on PortAudio:# macOSbrew install portaudio# Ubuntu/Debiansudo apt install portaudio19-devpip install "livekit-wakeword[listener]"Python 3.11 or later is required. Runtime dependencies are
numpyandonnxruntime.Load a model and score audio frames:
from livekit.wakeword import WakeWordModelmodel = WakeWordModel(models=["hey_livekit.onnx"])# Feed audio frames (16 kHz, int16 or float32)scores = model.predict(audio_frame)if scores["hey_livekit"] > 0.5:print("Wakeword detected!")(Alternative) For hands-free use, wrap the model with
WakeWordListener.wait_for_detectionblocks until a score crosses the threshold:import asynciofrom livekit.wakeword import WakeWordModel, WakeWordListenermodel = WakeWordModel(models=["hey_livekit.onnx"])async def main():async with WakeWordListener(model, threshold=0.5, debounce=2.0) as listener:while True:detection = await listener.wait_for_detection()print(f"Detected {detection.name} ({detection.confidence:.2f})")asyncio.run(main())thresholdis the minimum score (0 to 1) to count as a detection. Lower values are more sensitive but produce more false positives.debounceis the minimum interval, in seconds, between consecutive detections.
For a complete Python example wired up to a LiveKit Agents server, see hello-wakeword .
The Rust crate is inference only, meaning it only handles wakeword detection. You need to manage audio capture yourself. Use your preferred audio library (such as cpal ) to capture microphone audio and pass i16 PCM frames to predict().
Add the livekit-wakeword crate to your project:
cargo add livekit-wakewordLoad a model and score
i16PCM audio chunks at the configured sample rate:use livekit_wakeword::WakeWordModel;let mut model = WakeWordModel::new(&["hey_livekit.onnx"], 16000)?;let scores = model.predict(&audio_chunk)?;if scores["hey_livekit"] > 0.5 {println!("Wakeword detected!");}Input audio at sample rates between 16 kHz and 384 kHz is automatically resampled to 16 kHz. The mel spectrogram and embedding models are compiled into the binary, so only the classifier ONNX file is loaded at runtime. The crate uses a pure-Rust ONNX backend by default and falls back to the native ONNX Runtime on aarch64 Windows.
Add the
LiveKitWakeWordpackage to yourPackage.swift:.package(url: "https://github.com/livekit/livekit-wakeword", branch: "main"),Load a model and score
Int16PCM chunks at the configured sample rate:import LiveKitWakeWordlet classifier = Bundle.main.url(forResource: "hey_livekit", withExtension: "onnx")!let model = try WakeWordModel(models: [classifier], sampleRate: 16_000)let scores = try model.predict(audioChunk)if (scores["hey_livekit"] ?? 0) > 0.5 {print("Wakeword detected!")}(Alternative) For hands-free use, wrap the model with
WakeWordListenerand consume detections as an async sequence:import LiveKitWakeWordlet classifier = Bundle.main.url(forResource: "hey_livekit", withExtension: "onnx")!let model = try WakeWordModel(models: [classifier], sampleRate: 16_000)let listener = WakeWordListener(model: model, threshold: 0.5, debounce: 2.0)try listener.start()for await detection in listener.detections() {print("Detected \(detection.name) (\(String(format: "%.2f", detection.confidence)))")}thresholdis the minimum score (0 to 1) to count as a detection.debounceis the minimum interval, in seconds, between consecutive detections. AddNSMicrophoneUsageDescriptionto yourInfo.plist(andcom.apple.security.device.audio-inputon sandboxed macOS apps) before using the listener.
Audio at any sample rate is resampled to 16 kHz internally via AVAudioConverter. ONNX Runtime with the CoreML Execution Provider dispatches to ANE, GPU, or CPU by default.
A SwiftUI demo lives in examples/ios_wakeword/ .
Additional resources
The following resources provide more information about LiveKit wakeword detection.
livekit-wakeword
Source for the training toolkit, SDKs, and example apps.
hello-wakeword
End-to-end example of a wakeword-triggered voice agent.
Python package
The livekit-wakeword package on PyPI.
Rust crate
The livekit-wakeword crate for native clients.
Introducing livekit-wakeword
Blog post covering model architecture, training pipeline, and benchmarks.