start_of_speech
events, this contains the audio chunks that triggered the detection.inference_done
events, this contains the audio chunks that were processed.end_of_speech
events, this contains the complete user speech.Time taken to perform the inference, in seconds (only for INFERENCE_DONE
events).
Probability that speech is present (only for INFERENCE_DONE
events).
Index of the audio sample where the event occurred, relative to the inference sample rate.
Duration of the silence segment preceding or following the speech, in seconds.
Indicates whether speech was detected in the frames.
Duration of the detected speech segment in seconds.
Timestamp when the event was fired.
Type of the VAD event (e.g., start of speech, end of speech, inference done).
List of audio frames associated with the speech.