Adjusting Model Parameters

The RealtimeModel class is used to create a realtime conversational AI session. Below are the key parameters that can be passed when initializing the model, with a focus on the modalities, instructions, voice, turn_detection, temperature, and max_output_tokens options.

Parameters

`modalities`

Type: list[api_proto.Modality] Default: ["text", "audio"] Description: Specifies the input/output modalities supported by the model. This can be either or both of:

"text": The model processes text-based input and generates text responses.
"audio": The model processes audio input and can generate audio responses.

Example:

modalities=["text", "audio"]

`instructions`

Type: str | None Default: None Description: Custom instructions are the 'system prompt' for the model to follow during the conversation. This can be used to guide the behavior of the model or set specific goals.

Example:

instructions="Please provide responses that are brief and informative."

`voice`

Type: api_proto.Voice Default: "alloy" Description: Determines the voice used for audio responses. Some examples of voices include:

"alloy"
"echo"
"shimmer"

Example:

voice="alloy"

`turn_detection`

Type: api_proto.TurnDetectionType Default: {"type": "server_vad"} Description: Controls how the model detects when a speaker has finished talking, which is critical in realtime interactions.

"server_vad": OpenAI uses server side Voice Activity Detection (VAD) to detect when the user has stopped speaking. This can be fine-tuned using the following parameters:
- threshold (optional): Float value to control the sensitivity of speech detection.
- prefix_padding_ms (optional): The amount of time (in milliseconds) to pad before the detected speech.
- silence_duration_ms (optional): The amount of silence (in milliseconds) required to consider the speech finished.

Example:

turn_detection={
    "type": "server_vad",
    "threshold": 0.6,
    "prefix_padding_ms": 300,
    "silence_duration_ms": 500
}

`temperature`

Type: float Default: 0.8 Description: Controls the randomness of the model's output. Higher values (e.g., 1.0 and above) make the model's output more diverse and creative, while lower values (e.g., 0.6) makes it more focused and deterministic.

Example:

temperature=0.7

`max_output_tokens`

Type: int Default: 2048 Description: Limits the maximum number of tokens in the generated output. This helps control the length of the responses from the model, where one token roughly corresponds to one word.

Example:

max_output_tokens=1500

Example Initialization

Here is a full example of how to initialize the RealtimeModel with these parameters:

realtime_model = RealtimeModel(
    modalities=["text", "audio"],
    instructions="Give brief, concise answers.",
    voice="alloy",
    turn_detection=openai.realtime.ServerVadOptions(
        threshold=0.6, prefix_padding_ms=200, silence_duration_ms=500,
    ),
    temperature=0.7,
    max_output_tokens=1500,
)