Fallback strategies | LiveKit Documentation

Overview

In realtime voice conversations, a model API failure can leave the agent unable to continue. Fallback strategies let you define backup providers that automatically take over when the primary provider fails.

Both LiveKit fallback adapters trigger on any error from the primary provider, including connection failures, timeouts, HTTP errors (4xx, 5xx), and mid-stream disconnects.

The fallback adapters handle the following:

Automatically resubmit the failed request to backup providers when the primary provider fails.
Mark the failed provider as unhealthy and stop sending requests to it.
Continue to use the backup providers, periodically probing the failed provider in the background and restoring it once it responds successfully.

When a fallback is triggered, AgentSession emits an error event you can use to log failures or notify the user.

LiveKit provides two fallback mechanisms:

Inference Fallback Adapter: fallback logic runs server-side in the LiveKit Inference service. Supports STT and TTS only. Available in Python and Node.js.
Agent Fallback Adapter: fallback logic runs in your agent code. Supports STT, TTS, and LLM. Available in Python and Node.js.

Feature	Inference Fallback Adapter	Agent Fallback Adapter
Supported model types	STT, TTS	STT, TTS, LLM
Where fallback runs	Server-side in LiveKit Inference service	In your agent process
Python support	STT, TTS	STT, TTS, LLM
Node.js support	STT, TTS	STT, TTS, LLM

Inference Fallback Adapter

If you use LiveKit Inference, you can configure fallback models directly with the fallback parameter on inference.STT and inference.TTS. Fallback logic runs server-side in the LiveKit Inference service, so your agent code doesn't need to manage retries or health checks.

from livekit.agents import AgentSession, inference

session = AgentSession(
    stt=inference.STT(
        model="deepgram/nova-3",
        language="en",
        fallback=[
            {"model": "assemblyai/universal-streaming"},
        ],
    ),
    tts=inference.TTS(
        model="cartesia/sonic-3",
        voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        language="en",
        fallback=[
            {
                "model": "inworld/inworld-tts-1.5-max",
                "voice": "Ashley",
            },
        ],
    ),
    # ... llm, etc.
)

import { inference, voice } from '@livekit/agents';

const session = new voice.AgentSession({
  stt: new inference.STT({
    model: 'deepgram/nova-3',
    language: 'en',
    fallback: [{ model: 'assemblyai/universal-streaming' }],
  }),
  tts: new inference.TTS({
    model: 'cartesia/sonic-3',
    voice: '9626c31c-bec5-4cca-baa8-f8ba9e84c8bc',
    language: 'en',
    fallback: [
      {
        model: 'inworld/inworld-tts-1.5-max',
        voice: 'Ashley',
      },
    ],
  }),
  // ... llm, etc.
});

The model in the top-level parameter is the primary. Models in fallback are tried in order if the primary fails.

Behavior

The Inference Fallback Adapter treats any error as a reason to try the next provider in the chain, including errors during session creation, connection, and mid-stream. If the primary provider fails partway through streaming a response, the service switches to the next model and restarts the request from the beginning. The service only stops trying providers when all configured models have failed or the client disconnects.

Tip

If you use custom voices, TTS fallback across providers is automatic. Each cloned voice is cloned to more than one provider, so LiveKit Inference automatically falls back to another provider if the primary one is unavailable.

Agent Fallback Adapter

The Agent Fallback Adapter runs fallback logic directly in your agent process using plugins. Use it when you need LLM fallback support, or when you're connecting to providers that aren't available through LiveKit Inference.

from livekit.agents import llm, stt, tts
from livekit.plugins import assemblyai, cartesia, deepgram, inworld, openai

session = AgentSession(
    stt=stt.FallbackAdapter(
        [
            deepgram.STT(),
            assemblyai.STT(),
        ]
    ),
    llm=llm.FallbackAdapter(
        [
            openai.responses.LLM(model="gpt-4o"),
            openai.LLM.with_azure(model="gpt-4o", ...),
        ]
    ),
    tts=tts.FallbackAdapter(
        [
            cartesia.TTS(...),
            inworld.TTS(...),
        ]
    ),
)

const session = new voice.AgentSession({
  stt: new stt.FallbackAdapter({
    sttInstances: [new deepgram.STT(), new assemblyai.STT()],
  }),
  llm: new llm.FallbackAdapter({
    llms: [
      new openai.LLM({ model: 'gpt-4o' }),
      openai.LLM.withAzure({ model: 'gpt-4o' }),
    ],
  }),
  tts: new tts.FallbackAdapter({
    ttsInstances: [new cartesia.TTS(), new inworld.TTS()],
  }),
});

The first instance in each list is the primary. Subsequent instances are tried in order if it fails.

Behavior

The Agent Fallback Adapter triggers on any error, but applies partial output guards to avoid disrupting output that the user has already started receiving:

STT: no partial output guard. The adapter switches to the next provider on any error.
TTS: if audio has already been pushed to the speaker, the adapter does not switch to a backup provider mid-utterance. Fallback is skipped and the partial audio plays through.
LLM: if text or tool calls have already been streamed to the user, the adapter raises the error rather than restarting the response with a different model. Set retry_on_chunk_sent=True on llm.FallbackAdapter to override this and allow mid-stream fallback.

When a provider is restored after a failure, the Agent Fallback Adapter emits an availability-changed event (stt_availability_changed, llm_availability_changed, or tts_availability_changed) so you can observe the recovery from your agent code.