Options specific to saaras:v3 (recommended).

interface STTV3Options {
    apiKey?: string;
    firstTurnMinSpeechFrames?: number;
    flushSignal?: boolean;
    highVadSensitivity?: boolean;
    interruptMinSpeechFrames?: number;
    languageCode?: string;
    minSpeechFrames?: number;
    mode?: string;
    model?: "saaras:v3";
    negativeFramesCount?: number;
    negativeFramesWindow?: number;
    negativeSpeechThreshold?: number;
    numInitialIgnoredFrames?: number;
    positiveSpeechThreshold?: number;
    preSpeechPadFrames?: number;
    prompt?: string;
    startSpeechVolumeThreshold?: number;
    streaming?: boolean;
    withTimestamps?: boolean;
}

Hierarchy

  • STTBaseOptions
    • STTV3Options

Properties

apiKey?: string

Sarvam API key. Defaults to $SARVAM_API_KEY

firstTurnMinSpeechFrames?: number

Fine-grained VAD first-turn minimum speech frames (WS only).

flushSignal?: boolean

Enable flush signal events from server (WS only). Maps to flush_signal query param.

highVadSensitivity?: boolean

Increase VAD sensitivity (WS only). Maps to high_vad_sensitivity query param.

interruptMinSpeechFrames?: number

Fine-grained VAD interrupt minimum speech frames (WS only).

languageCode?: string

Language code (BCP-47). Default: 'en-IN'. Set to 'unknown' for auto-detection.

minSpeechFrames?: number

Fine-grained VAD minimum speech frames (WS only).

mode?: string

Transcription mode (v3 only). Default: 'transcribe'

model?: "saaras:v3"
negativeFramesCount?: number

Fine-grained VAD negative frames count (WS only).

negativeFramesWindow?: number

Fine-grained VAD negative frames window (WS only).

negativeSpeechThreshold?: number

Fine-grained VAD negative speech threshold (WS only).

numInitialIgnoredFrames?: number

Fine-grained VAD initial ignored frames (WS only).

positiveSpeechThreshold?: number

Fine-grained VAD positive speech threshold (WS only).

preSpeechPadFrames?: number

Fine-grained VAD pre-speech padding frames (WS only).

prompt?: string

Conversation context to boost model accuracy

startSpeechVolumeThreshold?: number

Fine-grained VAD start speech volume threshold (WS only).

streaming?: boolean

Whether to use native WebSocket streaming for stream(). Set to false to prefer non-streaming REST recognition (used by Agent via StreamAdapter + VAD). Default: true.

withTimestamps?: boolean

Return chunk-level timestamps in REST response