Turn detection

Build natural conversations with accurate turn detection

Turn detection is crucial in AI voice applications, helping the assistant know when the user has finished speaking and when to respond. Accurate detection is key to maintaining natural conversation flow and avoiding interruptions or awkward pauses.

Modifying the VAD parameters

OpenAI's Realtime API handles detection on the server side. You can fine-tune the Voice Activity Detection (VAD) by adjusting various parameters to suit your application's needs. Here are the parameters you can adjust:

  • threshold: Adjusts the sensitivity of the VAD. A lower threshold makes the VAD more sensitive to speech (detects quieter sounds), while a higher threshold makes it less sensitive. The default value is 0.5.
  • prefix_padding_ms: Minimum duration of speech (in milliseconds) required to start a new speech chunk. This helps prevent very short sounds from triggering speech detection.
  • silence_duration_ms: Minimum duration of silence (in milliseconds) at the end of speech before ending the speech segment. This ensures brief pauses do not prematurely end a speech segment.
assistant = multimodal.MultimodalAgent(
model=openai.realtime.RealtimeModel(
voice="alloy",
temperature=0.8,
instructions="You are a helpful assistant",
turn_detection=openai.realtime.ServerVadOptions(
threshold=0.6, prefix_padding_ms=200, silence_duration_ms=500
),
)
)
assistant.start(ctx.room)