Turn-taking tuning | LiveKit Documentation

Overview

Turn-taking in voice AI involves several stages of the agent pipeline:

User activity detection decides when the user has finished a turn so the agent can reply. Options include turn detection mode, endpointing delays, and endpointing mode.
Interruption handling decides when the user can cut the agent off mid-response. Options include enable/disable, detection mode, interruption thresholds, and false-interruption recovery.
Preemptive generation lets the LLM (and optionally TTS) start work before the user's turn is fully confirmed. Options include enable/disable, preemptive TTS, max speech duration, and max retries.
Audio pre-processing (noise cancellation, automatic gain control) cleans the input before any of these stages run. Options include voice isolation and background noise suppression.
Agent speech scheduling controls the cadence of the agent's own utterances. Options include the minimum gap between agent utterances (Python only).

The defaults are reasonable for most apps, but tuning matters when you're chasing low latency, working in noisy environments, or seeing specific symptoms like the agent cutting users off. This page gives a recommended starting config, a full reference of the options that affect each stage, and a troubleshooting table mapping common symptoms to the options that fix them.

For a deeper reference on each parameter, see TurnHandlingOptions.

Configuration

The next two sections cover a recommended starting config and a full options reference.

Recommended starting config

A starting point for a voice agent that needs to respond quickly in environments with background noise or other speakers. See All options for what each parameter does.

from livekit.agents import AgentSession, TurnHandlingOptions, inference, room_io
from livekit.plugins import ai_coustics

session = AgentSession(
    turn_handling=TurnHandlingOptions(
        turn_detection=inference.TurnDetector(),
        endpointing={
            "mode": "fixed",
            "min_delay": 0.5,
            "max_delay": 3.0,
        },
        interruption={
            "mode": "adaptive",
            "min_duration": 0.5,
            "min_words": 0,
        },
        # preemptive_generation is enabled by default. Opt into preemptive TTS
        # for lower latency at the cost of wasted compute on cancellations.
        preemptive_generation={
            "preemptive_tts": False,
        },
    ),
    # ... stt, tts, llm, etc.
)

await session.start(
    # ...,
    room_options=room_io.RoomOptions(
        audio_input=room_io.AudioInputOptions(
            noise_cancellation=ai_coustics.audio_enhancement(
                model=ai_coustics.EnhancerModel.QUAIL_VF_L,
            ),
        ),
    ),
)

import { inference, voice } from '@livekit/agents';
import * as aiCoustics from '@livekit/plugins-ai-coustics';

const session = new voice.AgentSession({
  turnHandling: {
    turnDetection: new inference.TurnDetector(),
    endpointing: {
      minDelay: 500,
      maxDelay: 3000,
    },
    interruption: {
      mode: 'adaptive',
      minDuration: 500,
      minWords: 0,
    },
    // preemptiveGeneration is enabled by default. Opt into preemptive TTS
    // for lower latency at the cost of wasted compute on cancellations.
    preemptiveGeneration: {
      preemptiveTts: false,
    },
  },
  // ... stt, tts, llm, etc.
});

await session.start({
  // ...,
  inputOptions: {
    noiseCancellation: aiCoustics.audioEnhancement({ model: 'quailVfL' }),
  },
});

For quieter environments, drop the noise cancellation argument from session.start(). The rest of the config still applies.

For SIP participants, swap voice isolation for the telephony-tuned Krisp model: noise_cancellation.BVCTelephony() (Python) or TelephonyBackgroundVoiceCancellation() (Node.js). For multi-speaker rooms, use background noise suppression instead of voice isolation.

All options

The following table lists the options that affect turn-taking, grouped by pipeline stage.

Option	Stage	What it controls	Default
`turn_detection` mode	User activity detection	How the session decides the user is done speaking. Options: turn detector model, VAD, STT endpointing, realtime LLM, manual.	Auto-selected
`endpointing.min_delay`	User activity detection	Minimum time after detected silence before the turn closes. In VAD mode this is `max(VAD silence, min_delay)`. In STT mode it adds to the provider's endpoint signal.	`0.5 seconds`
`endpointing.max_delay`	User activity detection	Maximum time the agent waits before forcing the turn closed.	`3.0 seconds`
`endpointing.mode`	User activity detection	`"fixed"` always uses the configured delays. `"dynamic"` adapts within the range based on session pause statistics.	`"fixed"`
`interruption.enabled`	Interruption handling	Master on/off toggle for interruptions. Set to `False` to make the agent uninterruptible.	`True`
`interruption.mode`	Interruption handling	`"adaptive"` (recommended) uses an audio model to distinguish real interruptions from backchannel acknowledgments. `"vad"` triggers on any detected speech.	`"adaptive"` if available, otherwise `"vad"`
`interruption.min_duration`	Interruption handling	Minimum speech duration to register as an interruption.	`0.5 seconds`
`interruption.min_words`	Interruption handling	Minimum word count to register as an interruption. Requires STT.	`0`
`interruption.false_interruption_timeout`	Interruption handling	Silence window after a detected interruption before it's classified as false. After this elapses with no transcript, the agent can resume (see `resume_false_interruption`).	`2.0 seconds`
`interruption.resume_false_interruption`	Interruption handling	Whether to resume the interrupted speech after the false-interruption timeout passes.	`True`
`preemptive_generation.enabled`	Preemptive generation	Whether to start LLM generation as soon as a final transcript arrives, before the turn is confirmed.	`True`
`preemptive_generation.preemptive_tts`	Preemptive generation	Also start TTS preemptively. Cuts more latency at the cost of wasted compute on cancellations.	`False`
`preemptive_generation.max_speech_duration`	Preemptive generation	Skip preemptive generation for utterances longer than this. Long turns are more likely to mutate.	`10.0 seconds`
`preemptive_generation.max_retries`	Preemptive generation	Cap on preemptive attempts per turn. Resets when the turn completes.	`3`
Voice isolation	Audio pre-processing	Suppresses competing voices in the input so STT, VAD, and the turn detector see clean audio. Models include ai-coustics QUAIL_VF_L, Krisp BVC, and Krisp BVCTelephony.	Off
Background noise suppression	Audio pre-processing	Suppresses non-speech noise. Use when the main challenge is environmental noise rather than competing speakers.	Off
`min_consecutive_speech_delay`	Agent speech scheduling	Minimum gap between consecutive agent utterances. Does not affect user-side turn detection.	`0.0 seconds`

Option	Stage	What it controls	Default
`turnDetection` mode	User activity detection	How the session decides the user is done speaking. Options: turn detector model, VAD, STT endpointing, realtime LLM, manual.	Auto-selected
`endpointing.minDelay`	User activity detection	Minimum time after detected silence before the turn closes. In VAD mode this is `max(VAD silence, minDelay)`. In STT mode it adds to the provider's endpoint signal.	`500 ms`
`endpointing.maxDelay`	User activity detection	Maximum time the agent waits before forcing the turn closed.	`3000 ms`
`interruption.enabled`	Interruption handling	Master on/off toggle for interruptions. Set to `false` to make the agent uninterruptible.	`true`
`interruption.mode`	Interruption handling	`"adaptive"` (recommended) uses an audio model to distinguish real interruptions from backchannel acknowledgments. `"vad"` triggers on any detected speech.	`"adaptive"` if available, otherwise `"vad"`
`interruption.minDuration`	Interruption handling	Minimum speech duration to register as an interruption.	`500 ms`
`interruption.minWords`	Interruption handling	Minimum word count to register as an interruption. Requires STT.	`0`
`interruption.falseInterruptionTimeout`	Interruption handling	Silence window after a detected interruption before it's classified as false. After this elapses with no transcript, the agent can resume (see `resumeFalseInterruption`).	`2000 ms`
`interruption.resumeFalseInterruption`	Interruption handling	Whether to resume the interrupted speech after the false-interruption timeout passes.	`true`
`preemptiveGeneration.enabled`	Preemptive generation	Whether to start LLM generation as soon as a final transcript arrives, before the turn is confirmed.	`true`
`preemptiveGeneration.preemptiveTts`	Preemptive generation	Also start TTS preemptively. Cuts more latency at the cost of wasted compute on cancellations.	`false`
`preemptiveGeneration.maxSpeechDuration`	Preemptive generation	Skip preemptive generation for utterances longer than this. Long turns are more likely to mutate.	`10000 ms`
`preemptiveGeneration.maxRetries`	Preemptive generation	Cap on preemptive attempts per turn. Resets when the turn completes.	`3`
Voice isolation	Audio pre-processing	Suppresses competing voices in the input so STT, VAD, and the turn detector see clean audio. Models include ai-coustics QUAIL_VF_L, Krisp BVC, and Krisp BVCTelephony.	Off
Background noise suppression	Audio pre-processing	Suppresses non-speech noise. Use when the main challenge is environmental noise rather than competing speakers.	Off

Troubleshooting

The following table maps common turn-taking complaints to the options that affect them.

Symptom	Likely options
Agent cuts users off mid-thought.	Switch `turn_detection` to the turn detector model. Raise `endpointing.min_delay`. Switch `interruption.mode` to `"adaptive"` if it isn't already. Add voice isolation if cross-talk or noise is causing false speech detection.
Agent is interrupted by short acknowledgments ("uh-huh," "okay").	Switch `interruption.mode` to `"adaptive"`. Raise `interruption.min_words` (requires STT) or `interruption.min_duration`. Confirm `false_interruption_timeout` and `resume_false_interruption` are at their defaults so the agent resumes after silent false positives.
Agent feels too slow to respond.	Confirm `preemptive_generation` is enabled (it is by default). Consider `preemptive_tts: true` to start TTS early. Lower `endpointing.min_delay`. In Python, switch `endpointing.mode` to `"dynamic"` to adapt to actual pause patterns.
Agent reads a partial transcript and replies based on incomplete input.	The preemptive response should be canceled when the final transcript changes. Confirm by checking that you aren't returning early from `on_user_turn_completed`. Lower `preemptive_generation.max_speech_duration` so long utterances skip preemptive responses entirely. Lower `max_retries` to avoid repeated retries on jittery transcripts.
Audio quality is fine but turn detection still misfires in noisy rooms.	Add voice isolation for single-speaker scenarios or background noise suppression for multi-speaker. Both run before VAD and STT, so they improve every downstream turn-taking signal.
Agent runs back-to-back utterances together with no breath (for example, a `say()` followed by a tool-driven `generate_reply()`).	Set `min_consecutive_speech_delay` to a small value like `0.2`–`0.4` seconds (Python only).

If you're tuning by feel, use agent observability to confirm changes actually move the metrics you care about. Preemptive generation in particular doesn't always reduce latency, and the metrics tell you whether your changes are pulling their weight.

Additional resources

Preemptive generation

Start LLM generation before the user's end of turn is confirmed.

Noise & echo cancellation

Background voice cancellation and noise suppression for cleaner input audio.

Turn handling options

Full reference for every turn-handling parameter.