Skip to main content

Answering machine detection

Classify whether a real person, voicemail, or IVR system answered an outbound call.

Overview

An outbound call can reach a person, voicemail, an IVR menu, or a number that can't accept messages. Answering machine detection (AMD) listens to the start of the call, classifies it with an LLM, and returns a result so your agent can respond appropriately.

How AMD works

AMD runs once at the start of the call, on the first user utterance. It doesn't monitor continuously. While AMD is running, the agent's speech is paused so it doesn't talk over a voicemail greeting before classification completes.

AMD classifies the call into one of five categories. Your agent uses the result to decide the next step: continue the conversation, leave a voicemail, navigate an IVR, or hang up.

AMD runs two paths in parallel: a fast-path heuristic for short greetings followed by silence, and an LLM classifier for transcripts that need more reasoning. The first path to reach a conclusion produces the result.

AMD classification flow: short speech and transcript inputs feed a fast-path heuristic and an LLM classifier, which together emit one of five categories: human, machine-ivr, machine-vm, machine-unavailable, or uncertain.
CategoryDescription
humanA real person answered. Proceed with normal conversation.
machine-ivrAn IVR or DTMF menu was detected. In Python, the session automatically starts IVR navigation when ivr_detection is enabled (the default). The Node.js SDK doesn't support IVR navigation, so the agent should handle machine-ivr the same as human and let the main agent respond.
machine-vmA voicemail greeting where leaving a message is possible.
machine-unavailableThe mailbox is full, not set up, or the callee is unreachable. Leaving a message isn't possible.
uncertainThe greeting can't be classified with confidence. Treat as a human and proceed with normal conversation.

Usage

Initialize AMD before creating the SIP participant so detection is ready before audio starts arriving. The detector pauses agent speech until a result is available.

Open the async context manager, then create the SIP participant inside it. Pass participant_identity so AMD's timers wait for that specific participant's audio track:

from livekit.agents import AMD
async with AMD(session, participant_identity=participant_identity) as detector:
await ctx.api.sip.create_sip_participant(
api.CreateSIPParticipantRequest(
room_name=ctx.room.name,
sip_trunk_id=outbound_trunk_id,
sip_call_to=phone_number,
participant_identity=participant_identity,
wait_until_answered=True,
)
)
await ctx.wait_for_participant(identity=participant_identity)
result = await detector.execute()
if result.category == "human" or result.category == "uncertain":
logger.info(
"human answered the call or amd is uncertain, proceeding with normal conversation",
extra={"transcript": result.transcript},
)
elif result.category == "machine-ivr":
logger.info("ivr menu detected, starting navigation")
elif result.category == "machine-vm":
logger.info("voicemail detected, leaving a message")
speech_handle = session.generate_reply(
instructions=(
"You've reached voicemail. Leave a brief message asking "
"the customer to call back."
),
)
await speech_handle.wait_for_playout()
ctx.shutdown("voicemail detected")
elif result.category == "machine-unavailable":
logger.info("mailbox unavailable, ending call")
ctx.shutdown("mailbox unavailable")

Instantiate the detector before creating the SIP participant. Pass participantIdentity so AMD's timers wait for that participant's audio track. Wrap the run in try/finally so detector.aclose() runs even on error:

import { voice } from '@livekit/agents';
import { SipClient } from 'livekit-server-sdk';
session._roomIO.setParticipant(participantIdentity);
const detector = new voice.AMD(session, { participantIdentity });
try {
const sip = new SipClient(
process.env.LIVEKIT_URL,
process.env.LIVEKIT_API_KEY,
process.env.LIVEKIT_API_SECRET,
);
await sip.createSipParticipant(outboundTrunkId, phoneNumber, ctx.room.name, {
participantIdentity,
waitUntilAnswered: true,
});
await ctx.waitForParticipant(participantIdentity);
const result = await detector.execute();
if (
result.category === voice.AMDCategory.HUMAN ||
result.category === voice.AMDCategory.UNCERTAIN ||
result.category === voice.AMDCategory.MACHINE_IVR
) {
logger.info(
{ amd: result },
'human or ivr menu detected, proceeding with normal conversation',
);
} else if (result.category === voice.AMDCategory.MACHINE_VM) {
logger.info({ amd: result }, 'voicemail detected, leaving a message');
const speechHandle = session.generateReply({
instructions:
"You've reached voicemail. Leave a brief message asking the customer to call back.",
});
await speechHandle.waitForPlayout();
session.shutdown({ reason: 'amd:machine-vm' });
} else if (result.category === voice.AMDCategory.MACHINE_UNAVAILABLE) {
logger.info({ amd: result }, 'mailbox unavailable, ending call');
session.shutdown({ reason: 'amd:machine-unavailable' });
}
} finally {
await detector.aclose();
}

Recommended models

AMD has been evaluated against a small set of LLMs and STT models on LiveKit Inference.

Behavior on unevaluated models isn't guaranteed, so AMD logs a compatibility warning when you pass an unevaluated model. Once you've validated your own choice, set suppress_compatibility_warning=True (Python) or suppressCompatibilityWarning: true (Node.js) to silence the warning.

Evaluated LLMs

  • google/gemini-3.1-flash-lite (default)
  • google/gemini-3-flash-preview
  • google/gemini-2.5-flash-lite
  • openai/gpt-4o
  • openai/gpt-4.1
  • openai/gpt-4.1-mini
  • openai/gpt-4.1-nano
  • openai/gpt-5.1
  • openai/gpt-5.1-chat-latest
  • openai/gpt-5.2
  • openai/gpt-5.2-chat-latest
  • openai/gpt-5.4

Evaluated STT models

  • cartesia/ink-whisper (default)
  • assemblyai/universal-streaming-multilingual
  • deepgram/nova-3

Parameters

Defaults are calibrated for typical outbound calls. Override them when you need different timing thresholds or a different classification prompt.

llmLLM | str

LLM used for greeting classification. Accepts an LLM instance or a LiveKit Inference model ID string. If not set, AMD uses google/gemini-3.1-flash-lite via LiveKit Inference when available, and otherwise falls back to the session's own LLM. See recommended models for the evaluated set.

sttSTT | str

STT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise reuses the session's existing STT transcripts. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.

interrupt_on_machineboolDefault: True

Interrupt any pending agent speech when a machine is detected.

participant_identitystr

Identity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.

ivr_detectionboolDefault: True

Automatically start IVR navigation when the result is machine-ivr. When False, AMD returns the machine-ivr result without starting navigation, and your agent decides how to handle it.

detection_optionsDetectionOptions

Override the default timing thresholds and classification prompt. Pass a dict with any of the following keys: human_speech_threshold (default 2.5), human_silence_threshold (default 0.5), machine_silence_threshold (default 1.5), no_speech_threshold (default 10.0), timeout (default 20.0), or prompt. All thresholds are in seconds. Values not provided fall back to library defaults.

suppress_compatibility_warningboolDefault: False

Silence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.

The Node.js SDK doesn't support IVR navigation, so treat machine-ivr results as a human conversation and let the main agent respond.

llmLLM | string

LLM used for greeting classification. Accepts an LLM instance or a LiveKit Inference model ID string. If not set, AMD uses google/gemini-3.1-flash-lite via LiveKit Inference when available, and otherwise falls back to the session's own LLM. See recommended models for the evaluated set.

sttSTT | string

STT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise listens to session-level transcripts instead. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.

interruptOnMachinebooleanDefault: true

Interrupt any pending agent speech when a machine is detected.

participantIdentitystring

Identity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the session's linked participant or the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.

noSpeechTimeoutMsnumberDefault: 10000

Maximum time in milliseconds to wait for a transcript before AMD gives up. When this elapses with no speech detected, AMD settles as machine-unavailable.

detectionTimeoutMsnumberDefault: 20000

Maximum time in milliseconds for the entire detection. When this elapses, AMD settles with whatever evidence is available.

suppressCompatibilityWarningbooleanDefault: false

Silence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.

Additional resources