Silero VAD plugin | LiveKit Documentation

Overview

The Silero VAD plugin provides voice activity detection (VAD) that contributes to accurate turn detection in voice AI applications.

VAD is a crucial component for voice AI applications as it helps determine when a user is speaking versus when they are silent. This enables natural turn-taking in conversations and helps optimize resource usage by only performing speech-to-text while the user speaks.

LiveKit recommends using the Silero VAD plugin in combination with the custom turn detector model for the best performance.

Quick reference

The following sections provide a quick overview of the Silero VAD plugin. For more information, see Additional resources.

Requirements

The model runs locally on the CPU and requires minimal system resources.

Installation

Install the Silero VAD plugin.

Install the plugin from PyPI:

uv add "livekit-agents[silero]~=1.5"

Install the plugin from npm:

pnpm install @livekit/agents-plugin-silero

Usage

Initialize your AgentSession with the Silero VAD plugin:

from livekit.plugins import silero

session = AgentSession(
    vad=silero.VAD.load(),
    # ... stt, tts, llm, etc.
)

import { voice } from '@livekit/agents';
import * as silero from '@livekit/agents-plugin-silero';

const session = new voice.AgentSession({
  vad: await silero.VAD.load(),
  // ... stt, tts, llm, etc.
});

Prewarm

You can prewarm the plugin to improve load times for new jobs:

from livekit.agents import AgentServer


server = AgentServer()


def prewarm(proc: agents.JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm

@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        # ... stt, tts, llm, etc.
    )

    # ... session.start etc ...


if __name__ == "__main__":
    agents.cli.run_app(server)

import { voice, defineAgent, cli, ServerOptions, type JobContext, type JobProcess } from '@livekit/agents';
import * as silero from '@livekit/agents-plugin-silero';
import { fileURLToPath } from 'node:url';

export default defineAgent({
  prewarm: async (proc: JobProcess) => {
    proc.userData.vad = await silero.VAD.load();
  },
  entry: async (ctx: JobContext) => {
    const vad = ctx.proc.userData.vad! as silero.VAD;

    const session = new voice.AgentSession({
      vad,
      // ... stt, tts, llm, etc.
    });

    // ... session.start etc ...
  },
});

cli.runApp(new ServerOptions({ agent: fileURLToPath(import.meta.url) }));

Configuration

The following parameters are available on the load method:

min_speech_durationfloatDefault: 0.05

Minimum duration of speech required to start a new speech chunk.

min_silence_durationfloatDefault: 0.55

Duration of silence to wait after speech ends to determine if the user has finished speaking.

prefix_padding_durationfloatDefault: 0.5

Duration of padding to add to the beginning of each speech chunk.

max_buffered_speechfloatDefault: 60.0

Maximum duration of speech to keep in the buffer (in seconds).

activation_thresholdfloatDefault: 0.5

Threshold to consider a frame as speech. A higher threshold results in more conservative detection but might miss soft speech. A lower threshold results in more sensitive detection, but might identify noise as speech.

sample_rateLiteral[8000, 16000]Default: 16000

Sample rate for the inference (only 8KHz and 16KHz are supported).

force_cpuboolDefault: True

Force the use of CPU for inference.

Additional resources

The following resources provide more information about using the LiveKit Silero VAD plugin.

Python package

The livekit-plugins-silero package on PyPI.

Plugin reference

Reference for the LiveKit Silero VAD plugin.

GitHub repo

View the source or contribute to the LiveKit Silero VAD plugin.

Silero VAD project

The open source VAD model that powers the LiveKit Silero VAD plugin.

Transcriber

An example using standalone VAD and STT outside of an AgentSession.