interface VADEvent {
    frames: AudioFrame[];
    inferenceDuration: number;
    probability: number;
    samplesIndex: number;
    silenceDuration: number;
    speaking: boolean;
    speechDuration: number;
    timestamp: number;
    type: VADEventType;
}

Properties

frames: AudioFrame[]

List of audio frames associated with the speech.

Remarks

  • For start_of_speech events, this contains the audio chunks that triggered the detection.
  • For inference_done events, this contains the audio chunks that were processed.
  • For end_of_speech events, this contains the complete user speech.
inferenceDuration: number

Time taken to perform the inference, in seconds (only for INFERENCE_DONE events).

probability: number

Probability that speech is present (only for INFERENCE_DONE events).

samplesIndex: number

Index of the audio sample where the event occurred, relative to the inference sample rate.

silenceDuration: number

Duration of the silence segment preceding or following the speech, in seconds.

speaking: boolean

Indicates whether speech was detected in the frames.

speechDuration: number

Duration of the detected speech segment in seconds.

timestamp: number

Timestamp when the event was fired.

Type of the VAD event (e.g., start of speech, end of speech, inference done).