Answering machine detection | LiveKit Documentation

Overview

An outbound call can reach a person, voicemail, an IVR menu, or a number that can't accept messages. Answering machine detection (AMD) listens to the start of the call, classifies it with an LLM, and returns a result so your agent can respond appropriately.

How AMD works

AMD runs once at the start of the call, on the first user utterance. It doesn't monitor continuously. While AMD is running, the agent's speech is paused so it doesn't talk over a voicemail greeting before classification completes.

AMD classifies the call into one of five categories. Your agent uses the result to decide the next step: continue the conversation, leave a voicemail, navigate an IVR, or hang up.

AMD runs two paths in parallel: a fast-path heuristic for short greetings followed by silence, and an LLM classifier for transcripts that need more reasoning. The first path to reach a conclusion produces the result.

Category	Description
`human`	A real person answered. Proceed with normal conversation.
`machine-ivr`	An IVR or DTMF menu was detected. In Python, the session automatically starts IVR navigation when `ivr_detection` is enabled (the default). The Node.js SDK doesn't support IVR navigation, so the agent should handle `machine-ivr` the same as `human` and let the main agent respond.
`machine-vm`	A voicemail greeting where leaving a message is possible.
`machine-unavailable`	The mailbox is full, not set up, or the callee is unreachable. Leaving a message isn't possible.
`uncertain`	The greeting can't be classified with confidence. Treat as a human and proceed with normal conversation.

Usage

Initialize AMD before creating the SIP participant so detection is ready before audio starts arriving. The detector pauses agent speech until a result is available.

Open the async context manager, then create the SIP participant inside it. Pass participant_identity so AMD's timers wait for that specific participant's audio track:

import os

from livekit.agents import AMD
from livekit.protocol.sip import SIPOutboundConfig

async with AMD(session, participant_identity=participant_identity) as detector:
    await ctx.api.sip.create_sip_participant(
        api.CreateSIPParticipantRequest(
            trunk=SIPOutboundConfig(
                hostname=os.getenv("SIP_TRUNK_HOSTNAME"),
                auth_username=os.getenv("SIP_AUTH_USERNAME"),
                auth_password=os.getenv("SIP_AUTH_PASSWORD"),
            ),
            sip_number="<SIP provider number>",
            room_name=ctx.room.name,
            sip_call_to=phone_number,
            participant_identity=participant_identity,
            wait_until_answered=True,
        )
    )
    await ctx.wait_for_participant(identity=participant_identity)

    result = await detector.execute()

    if result.category == "human" or result.category == "uncertain":
        logger.info(
            "human answered the call or amd is uncertain, proceeding with normal conversation",
            extra={"transcript": result.transcript},
        )
    elif result.category == "machine-ivr":
        logger.info("ivr menu detected, starting navigation")
    elif result.category == "machine-vm":
        logger.info("voicemail detected, leaving a message")
        speech_handle = session.generate_reply(
            instructions=(
                "You've reached voicemail. Leave a brief message asking "
                "the customer to call back."
            ),
        )
        await speech_handle.wait_for_playout()
        ctx.shutdown("voicemail detected")
    elif result.category == "machine-unavailable":
        logger.info("mailbox unavailable, ending call")
        ctx.shutdown("mailbox unavailable")

Instantiate the detector before creating the SIP participant. Pass participantIdentity so AMD's timers wait for that participant's audio track. Wrap the run in try/finally so detector.aclose() runs even on error:

import { voice } from '@livekit/agents';
import { SipClient } from 'livekit-server-sdk';

session._roomIO.setParticipant(participantIdentity);
const detector = new voice.AMD(session, { participantIdentity });

try {
  const sip = new SipClient(
    process.env.LIVEKIT_URL,
    process.env.LIVEKIT_API_KEY,
    process.env.LIVEKIT_API_SECRET,
  );
  await sip.createSipParticipant(
    '', // Empty string when using inline trunk config
    phoneNumber,
    ctx.room.name,
    {
      participantIdentity,
      fromNumber: '<SIP provider number>',
      waitUntilAnswered: true,
    },
    { // Inline trunk configuration
      hostname: process.env.SIP_TRUNK_HOSTNAME,
      authUsername: process.env.SIP_AUTH_USERNAME,
      authPassword: process.env.SIP_AUTH_PASSWORD,
    },
  );
  await ctx.waitForParticipant(participantIdentity);

  const result = await detector.execute();

  if (
    result.category === voice.AMDCategory.HUMAN ||
    result.category === voice.AMDCategory.UNCERTAIN ||
    result.category === voice.AMDCategory.MACHINE_IVR
  ) {
    logger.info(
      { amd: result },
      'human or ivr menu detected, proceeding with normal conversation',
    );
  } else if (result.category === voice.AMDCategory.MACHINE_VM) {
    logger.info({ amd: result }, 'voicemail detected, leaving a message');
    const speechHandle = session.generateReply({
      instructions:
        "You've reached voicemail. Leave a brief message asking the customer to call back.",
    });
    await speechHandle.waitForPlayout();
    session.shutdown({ reason: 'amd:machine-vm' });
  } else if (result.category === voice.AMDCategory.MACHINE_UNAVAILABLE) {
    logger.info({ amd: result }, 'mailbox unavailable, ending call');
    session.shutdown({ reason: 'amd:machine-unavailable' });
  }
} finally {
  await detector.aclose();
}

Stored outbound trunk

You can also use a stored outbound trunk by passing sip_trunk_id (Python) or sipTrunkId (Node.js) instead of inline trunk configuration. For details, see Outbound trunk.

Recommended models

AMD has been evaluated against a small set of LLMs and STT models on LiveKit Inference.

Behavior on unevaluated models isn't guaranteed, so AMD logs a compatibility warning when you pass an unevaluated model. Once you've validated your own choice, set suppress_compatibility_warning=True (Python) or suppressCompatibilityWarning: true (Node.js) to silence the warning.

Evaluated LLMs

google/gemini-3.1-flash-lite (default)
google/gemini-3-flash-preview
google/gemini-2.5-flash-lite
openai/gpt-4o
openai/gpt-4.1
openai/gpt-4.1-mini
openai/gpt-4.1-nano
openai/gpt-5.1
openai/gpt-5.1-chat-latest
openai/gpt-5.2
openai/gpt-5.2-chat-latest
openai/gpt-5.4

Evaluated STT models

cartesia/ink-whisper (default)
assemblyai/universal-streaming-multilingual
deepgram/nova-3

Parameters

Defaults are calibrated for typical outbound calls. Override them when you need different timing thresholds or a different classification prompt.

llmLLM | str

LLM used for greeting classification. Accepts an LLM instance or a LiveKit Inference model ID string. If not set, AMD uses google/gemini-3.1-flash-lite via LiveKit Inference when available, and otherwise falls back to the session's own LLM. See recommended models for the evaluated set.

sttSTT | str

STT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise reuses the session's existing STT transcripts. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.

interrupt_on_machineboolDefault: True

Interrupt any pending agent speech when a machine is detected.

participant_identitystr

Identity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.

ivr_detectionboolDefault: True

Automatically start IVR navigation when the result is machine-ivr. When False, AMD returns the machine-ivr result without starting navigation, and your agent decides how to handle it.

detection_optionsDetectionOptions

Override the default timing thresholds and classification prompt. Pass a dict with any of the following keys: human_speech_threshold (default 2.5), human_silence_threshold (default 0.5), machine_silence_threshold (default 1.5), no_speech_threshold (default 10.0), timeout (default 20.0), or prompt. All thresholds are in seconds. Values not provided fall back to library defaults.

suppress_compatibility_warningboolDefault: False

Silence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.

The Node.js SDK doesn't support IVR navigation, so treat machine-ivr results as a human conversation and let the main agent respond.

llmLLM | string

sttSTT | string

STT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise listens to session-level transcripts instead. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.

interruptOnMachinebooleanDefault: true

Interrupt any pending agent speech when a machine is detected.

participantIdentitystring

Identity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the session's linked participant or the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.

humanSpeechThresholdMsnumberDefault: 2500

Maximum length in milliseconds of a "short greeting." Speech shorter than this triggers the fast-path human heuristic; speech longer is treated as machine-like and defers to the LLM classifier.

humanSilenceThresholdMsnumberDefault: 500

Silence in milliseconds after a short greeting before AMD settles as human. Shorter values commit to human faster on quick "Hello?" greetings.

machineSilenceThresholdMsnumberDefault: 1500

Silence in milliseconds after machine-like speech before AMD opens the silence gate and emits a verdict. Longer values give the LLM more time to review the transcript.

noSpeechTimeoutMsnumberDefault: 10000

Maximum time in milliseconds to wait for a transcript before AMD gives up. When this elapses with no speech detected, AMD settles as machine-unavailable.

detectionTimeoutMsnumberDefault: 20000

Maximum time in milliseconds for the entire detection. When this elapses, AMD settles with whatever evidence is available.

promptstring

Override the default classification prompt passed to the LLM. Use this to bias detection toward your domain (for example, recognizing region-specific voicemail phrasing) or to translate the prompt into another language.

suppressCompatibilityWarningbooleanDefault: false

Silence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.

Additional resources

AMD example (Python)

Outbound voice agent that runs AMD before responding and branches on the classification result.

livekit/agents

AMD example (Node.js)

Outbound voice agent that runs AMD before responding and branches on the classification result.

livekit/agents-js

DTMF and IVR navigation

Send and receive DTMF tones, and navigate IVR systems after AMD detection.

Outbound calls

Create SIP participants and place outbound calls that AMD can classify.