Confidence score of the transcription (0-1).
End time of the speech segment in seconds.
Language code of the speech.
Optional metadataOptional plugin-specific metadata (e.g. voice profile, provider diagnostics).
Plugins may populate this with provider-specific data that doesn't map to standard fields.
Optional sourceThe source languages spoken by the user.
Populated by STT services that support translation, where language holds the
target language and sourceLanguages holds the original spoken language(s),
or by multi-language detection services where language holds the dominant
language and sourceLanguages holds all detected languages sorted by prevalence.
May contain multiple entries when a single utterance spans multiple source languages.
Optional sourceThe original transcription segments in the source language(s), when translation is active.
Each entry corresponds to the same-indexed entry in sourceLanguages.
Optional speakerSpeaker identifier when the provider supports diarization.
Start time of the speech segment in seconds.
Optional targetThe target language(s) produced by a translation-capable STT service, one entry per
consecutive same-language run, parallel to targetTexts.
language holds the dominant or first target language and targetLanguages carries the
fine-grained per-run breakdown. Populated when translation is active.
Optional targetThe translated transcription segments in the target language(s).
Each entry corresponds to the same-indexed entry in targetLanguages.
Transcribed text.
Optional wordsWord-level timing information.
SpeechData contains metadata about this SpeechEvent.