Adjusting Model Parameters

Fine-tune the assistant’s responses by tweaking model parameters such as temperature and maximum tokens

The RealtimeModel class is used to create a realtime conversational AI session. Below are the key parameters that can be passed when initializing the model, with a focus on the modalities, instructions, voice, turn_detection, temperature, and max_output_tokens options.

Parameters

modalities

Type: list[api_proto.Modality] Default: ["text", "audio"] Description: Specifies the input/output modalities supported by the model. This can be either or both of:

  • "text": The model processes text-based input and generates text responses.
  • "audio": The model processes audio input and can generate audio responses.

Example:

modalities=["text", "audio"]

instructions

Type: str | None Default: None Description: Custom instructions are the 'system prompt' for the model to follow during the conversation. This can be used to guide the behavior of the model or set specific goals.

Example:

instructions="Please provide responses that are brief and informative."

voice

Type: api_proto.Voice Default: "alloy" Description: Determines the voice used for audio responses. Some examples of voices include:

  • "alloy"
  • "echo"
  • "shimmer"

Example:

voice="alloy"

turn_detection

Type: api_proto.TurnDetectionType Default: {"type": "server_vad"} Description: Controls how the model detects when a speaker has finished talking, which is critical in realtime interactions.

  • "server_vad": OpenAI uses server side Voice Activity Detection (VAD) to detect when the user has stopped speaking. This can be fine-tuned using the following parameters:
    • threshold (optional): Float value to control the sensitivity of speech detection.
    • prefix_padding_ms (optional): The amount of time (in milliseconds) to pad before the detected speech.
    • silence_duration_ms (optional): The amount of silence (in milliseconds) required to consider the speech finished.

Example:

turn_detection={
"type": "server_vad",
"threshold": 0.6,
"prefix_padding_ms": 300,
"silence_duration_ms": 500
}

temperature

Type: float Default: 0.8 Description: Controls the randomness of the model's output. Higher values (e.g., 1.0 and above) make the model's output more diverse and creative, while lower values (e.g., 0.6) makes it more focused and deterministic.

Example:

temperature=0.7

max_output_tokens

Type: int Default: 2048 Description: Limits the maximum number of tokens in the generated output. This helps control the length of the responses from the model, where one token roughly corresponds to one word.

Example:

max_output_tokens=1500

Example Initialization

Here is a full example of how to initialize the RealtimeModel with these parameters:

realtime_model = RealtimeModel(
modalities=["text", "audio"],
instructions="Give brief, concise answers.",
voice="alloy",
turn_detection=openai.realtime.ServerVadOptions(
threshold=0.6, prefix_padding_ms=200, silence_duration_ms=500,
),
temperature=0.7,
max_output_tokens=1500,
)