Describes a scheduled speech turn. startTime/endTime/sttDelay are in milliseconds and keyed to the moment the first audio frame is pushed into the stream — not wall-clock. sttDelay is the provider's transcription lag; the stream emits an interim result halfway through and a final result at endTime + sttDelay.

Python uses seconds here — multiply Python fixtures by 1000 when porting.

interface FakeUserSpeech {
    endTime: number;
    startTime: number;
    sttDelay: number;
    transcript: string;
}

Properties

endTime: number
startTime: number
sttDelay: number
transcript: string