SpeechData contains metadata about this SpeechEvent.

interface SpeechData {
    confidence: number;
    endTime: number;
    language: LanguageCode;
    metadata?: Record<string, unknown>;
    sourceLanguages?: LanguageCode[];
    sourceTexts?: string[];
    speakerId?: null | string;
    startTime: number;
    targetLanguages?: LanguageCode[];
    targetTexts?: string[];
    text: string;
    words?: TimedString[];
}

Properties

confidence: number

Confidence score of the transcription (0-1).

endTime: number

End time of the speech segment in seconds.

language: LanguageCode

Language code of the speech.

metadata?: Record<string, unknown>

Optional plugin-specific metadata (e.g. voice profile, provider diagnostics).

Plugins may populate this with provider-specific data that doesn't map to standard fields.

sourceLanguages?: LanguageCode[]

The source languages spoken by the user.

Populated by STT services that support translation, where language holds the target language and sourceLanguages holds the original spoken language(s), or by multi-language detection services where language holds the dominant language and sourceLanguages holds all detected languages sorted by prevalence.

May contain multiple entries when a single utterance spans multiple source languages.

sourceTexts?: string[]

The original transcription segments in the source language(s), when translation is active. Each entry corresponds to the same-indexed entry in sourceLanguages.

speakerId?: null | string

Speaker identifier when the provider supports diarization.

startTime: number

Start time of the speech segment in seconds.

targetLanguages?: LanguageCode[]

The target language(s) produced by a translation-capable STT service, one entry per consecutive same-language run, parallel to targetTexts.

language holds the dominant or first target language and targetLanguages carries the fine-grained per-run breakdown. Populated when translation is active.

targetTexts?: string[]

The translated transcription segments in the target language(s). Each entry corresponds to the same-indexed entry in targetLanguages.

text: string

Transcribed text.

words?: TimedString[]

Word-level timing information.