Skip to main content

Turn handling options

Reference documentation for turn handling options in LiveKit Agents.

TurnHandlingOptions

The turn_handling parameter accepts a TurnHandlingOptions object (or plain object) that controls turn detection, endpointing, and interruption behavior in an agent session. Pass it to the AgentSession constructor.

Usage

The following example creates an AgentSession with turn detection set to VAD and custom endpointing and interruption handling settings:

from livekit.agents import AgentSession, TurnHandlingOptions
session = AgentSession(
turn_handling=TurnHandlingOptions(
turn_detection="vad",
endpointing={
"mode": "fixed",
"min_delay": 0.5,
"max_delay": 3.0,
},
interruption={
"mode": "adaptive",
"min_duration": 0.5,
"resume_false_interruption": True,
},
),
# ... other parameters
)
const session = new voice.AgentSession({
turnHandling: {
turnDetection: 'vad',
endpointing: {
minDelay: 500,
maxDelay: 3000,
},
interruption: {
mode: 'adaptive',
minDuration: 500,
resumeFalseInterruption: true,
},
},
// ... other parameters
});

Parameters

The following parameters are available in the TurnHandlingOptions object (the turn_handling argument):

turn_detectionTurnDetectionMode | NoneOptional

Strategy for deciding when the user has finished speaking.

Options:

  • "stt" - Rely on speech-to-text end-of-utterance cues.
  • "vad" - Rely on Voice Activity Detection (VAD) start and stop cues.
  • "realtime_llm" - Use server-side detection from a realtime LLM.
  • "manual" - Control turn boundaries explicitly.
  • TurnDetector instance - Plug-in custom detector (for example, MultilingualModel()).

If this parameter is omitted, the session chooses the best available mode in priority order: realtime_llm → vad → stt → manual and automatically falls back to the next available mode if the necessary model is missing. See Turns overview for mode descriptions and fallback behavior.

endpointingEndpointingOptionsOptional

Options for endpointing behavior. See EndpointingOptions for details.

interruptionInterruptionOptionsOptional

Options for interruption handling. See InterruptionOptions for details.

EndpointingOptions

Options for endpointing behavior, which determines timing thresholds for turn completion. With fixed endpointing (the default), the agent always uses the configured min_delay and max_delay.

For context and configuration in a session, see Endpointing configuration in the turns overview.

Dynamic endpointing

Only Available in
Python

When you use dynamic endpointing, the agent adapts the delay within the min_delay and max_delay range based on session pause statistics. This can result in a more responsive turn-taking experience over time.

Usage

The following example creates turn handling options with custom endpointing settings. Pass it to the turn_handling parameter of AgentSession:

The following example enables dynamic endpointing:

turn_handling = {
"endpointing": {
"mode": "dynamic",
"min_delay": 0.5,
"max_delay": 3.0,
},
}
const turnHandling = {
endpointing: {
mode: 'fixed',
minDelay: 500,
maxDelay: 3000,
},
};

Parameters

The following parameters are available in the endpointing options object EndpointingOptions:

modeLiteral['dynamic', 'fixed']OptionalDefault: fixed

Endpointing timing behavior. The endpointing delay is the time the agent waits before terminating the users's turn.

  • "fixed" - Use the configured min_delay and max_delay values to determine the endpointing delay.

  • Dynamic endpointing only Available in
    Python

    "dynamic" - Adapt the delay within the min_delay and max_delay range based on session pause statistics (exponential moving average of between-utterance and between-turn pauses). Suits most conversations.

min_delayfloatOptionalDefault: 0.5 seconds

Minimum time (in seconds) to wait since the last detected speech to declare the user's turn to be complete.

With dynamic endpointing (Python only), this is the lower bound. The agent might use a longer effective delay when session pause statistics suggest slower turn-taking.

  • In VAD mode, this effectively behaves like max(VAD silence, min_delay).
  • In STT mode, this is applied after the STT end-of-speech signal, and therefore in addition to the STT provider's endpointing delay.
max_delayfloatOptionalDefault: 3.0 seconds

Maximum time (in seconds) the agent waits before terminating the turn. This prevents the agent from waiting indefinitely for the user to continue speaking.

With dynamic endpointing (Python only), this is the upper bound. The agent might use a shorter effective delay when session pause statistics suggest faster turn-taking.

Time units

In Node.js, min_delay and max_delay are in milliseconds (for example, 500 and 3000). Python uses seconds (for example, 0.5 and 3.0).

InterruptionOptions

Options for interruption handling, including adaptive detection and false interruption recovery.

Usage

The following example creates turn handling options with adaptive interruption handling. Pass it to the turn_handling parameter of AgentSession:

turn_handling = {
"interruption": {
"mode": "adaptive",
"min_duration": 0.5,
"min_words": 0,
"discard_audio_if_uninterruptible": True,
"false_interruption_timeout": 2.0,
"resume_false_interruption": True,
},
}
const turnHandling = {
interruption: {
mode: 'adaptive',
minDuration: 500,
minWords: 0,
discardAudioIfUninterruptible: true,
falseInterruptionTimeout: 2000,
resumeFalseInterruption: true,
},
};

To disable interruptions entirely, set enabled to false in the interruption options:

turn_handling = {
"interruption": {"enabled": False},
}
const turnHandling = {
interruption: {
enabled: false,
},
};

Parameters

The following parameters are available in the interruption options object InterruptionOptions:

enabledboolOptionalDefault: True

When True, the agent can be interrupted by user speech. When False, interruptions are disabled entirely. Use {"enabled": False} to disable; the previous bool shorthand is no longer supported.

modeLiteral['adaptive', 'vad']Optional

Interruption detection strategy. Only applies when enabled is True.

Options:

If this parameter is omitted, the session uses "adaptive" when a turn detector model is configured with an STT that supports aligned transcripts. Otherwise, it falls back to "vad".

discard_audio_if_uninterruptibleboolOptionalDefault: True

When True, drop buffered audio while the agent is speaking and cannot be interrupted. This prevents audio buildup during uninterruptible speech. See Interruptions for context.

min_durationfloatOptionalDefault: 0.5 seconds

Minimum duration of speech to be considered as an interruption. Helps filter out brief sounds or noise that shouldn't trigger interruptions. Python uses seconds (for example, 0.5); Node.js uses milliseconds (for example, 500).

min_wordsintOptionalDefault: 0

Minimum number of words to be considered as an interruption. Only used if STT is enabled. Set to a value greater than 0 to require actual speech content before triggering interruptions.

false_interruption_timeoutfloat | NoneOptionalDefault: 2.0 seconds

Amount of time (in seconds) to wait after an interruption before emitting an agent_false_interruption event if the user is silent and no user transcript is detected.

Set to None to disable false interruption detection. When disabled, all interruptions are treated as intentional. To learn more, see False interruptions. In Node.js, use milliseconds.

resume_false_interruptionboolOptionalDefault: True

Whether to resume the agent's speech after a false interruption is detected. When True, the agent continues speaking from where it left off after the false_interruption_timeout period has passed with no user transcription. To learn more, see False interruptions.