Overview
An outbound call can reach a person, voicemail, an IVR menu, or a number that can't accept messages. Answering machine detection (AMD) listens to the start of the call, classifies it with an LLM, and returns a result so your agent can respond appropriately.
How AMD works
AMD runs once at the start of the call, on the first user utterance. It doesn't monitor continuously. While AMD is running, the agent's speech is paused so it doesn't talk over a voicemail greeting before classification completes.
AMD classifies the call into one of five categories. Your agent uses the result to decide the next step: continue the conversation, leave a voicemail, navigate an IVR, or hang up.
AMD runs two paths in parallel: a fast-path heuristic for short greetings followed by silence, and an LLM classifier for transcripts that need more reasoning. The first path to reach a conclusion produces the result.
| Category | Description |
|---|---|
human | A real person answered. Proceed with normal conversation. |
machine-ivr | An IVR or DTMF menu was detected. In Python, the session automatically starts IVR navigation when ivr_detection is enabled (the default). The Node.js SDK doesn't support IVR navigation, so the agent should handle machine-ivr the same as human and let the main agent respond. |
machine-vm | A voicemail greeting where leaving a message is possible. |
machine-unavailable | The mailbox is full, not set up, or the callee is unreachable. Leaving a message isn't possible. |
uncertain | The greeting can't be classified with confidence. Treat as a human and proceed with normal conversation. |
Usage
Initialize AMD before creating the SIP participant so detection is ready before audio starts arriving. The detector pauses agent speech until a result is available.
Open the async context manager, then create the SIP participant inside it. Pass participant_identity so AMD's timers wait for that specific participant's audio track:
from livekit.agents import AMDasync with AMD(session, participant_identity=participant_identity) as detector:await ctx.api.sip.create_sip_participant(api.CreateSIPParticipantRequest(room_name=ctx.room.name,sip_trunk_id=outbound_trunk_id,sip_call_to=phone_number,participant_identity=participant_identity,wait_until_answered=True,))await ctx.wait_for_participant(identity=participant_identity)result = await detector.execute()if result.category == "human" or result.category == "uncertain":logger.info("human answered the call or amd is uncertain, proceeding with normal conversation",extra={"transcript": result.transcript},)elif result.category == "machine-ivr":logger.info("ivr menu detected, starting navigation")elif result.category == "machine-vm":logger.info("voicemail detected, leaving a message")speech_handle = session.generate_reply(instructions=("You've reached voicemail. Leave a brief message asking ""the customer to call back."),)await speech_handle.wait_for_playout()ctx.shutdown("voicemail detected")elif result.category == "machine-unavailable":logger.info("mailbox unavailable, ending call")ctx.shutdown("mailbox unavailable")
Instantiate the detector before creating the SIP participant. Pass participantIdentity so AMD's timers wait for that participant's audio track. Wrap the run in try/finally so detector.aclose() runs even on error:
import { voice } from '@livekit/agents';import { SipClient } from 'livekit-server-sdk';session._roomIO.setParticipant(participantIdentity);const detector = new voice.AMD(session, { participantIdentity });try {const sip = new SipClient(process.env.LIVEKIT_URL,process.env.LIVEKIT_API_KEY,process.env.LIVEKIT_API_SECRET,);await sip.createSipParticipant(outboundTrunkId, phoneNumber, ctx.room.name, {participantIdentity,waitUntilAnswered: true,});await ctx.waitForParticipant(participantIdentity);const result = await detector.execute();if (result.category === voice.AMDCategory.HUMAN ||result.category === voice.AMDCategory.UNCERTAIN ||result.category === voice.AMDCategory.MACHINE_IVR) {logger.info({ amd: result },'human or ivr menu detected, proceeding with normal conversation',);} else if (result.category === voice.AMDCategory.MACHINE_VM) {logger.info({ amd: result }, 'voicemail detected, leaving a message');const speechHandle = session.generateReply({instructions:"You've reached voicemail. Leave a brief message asking the customer to call back.",});await speechHandle.waitForPlayout();session.shutdown({ reason: 'amd:machine-vm' });} else if (result.category === voice.AMDCategory.MACHINE_UNAVAILABLE) {logger.info({ amd: result }, 'mailbox unavailable, ending call');session.shutdown({ reason: 'amd:machine-unavailable' });}} finally {await detector.aclose();}
Recommended models
AMD has been evaluated against a small set of LLMs and STT models on LiveKit Inference.
Behavior on unevaluated models isn't guaranteed, so AMD logs a compatibility warning when you pass an unevaluated model. Once you've validated your own choice, set suppress_compatibility_warning=True (Python) or suppressCompatibilityWarning: true (Node.js) to silence the warning.
Evaluated LLMs
google/gemini-3.1-flash-lite(default)google/gemini-3-flash-previewgoogle/gemini-2.5-flash-liteopenai/gpt-4oopenai/gpt-4.1openai/gpt-4.1-miniopenai/gpt-4.1-nanoopenai/gpt-5.1openai/gpt-5.1-chat-latestopenai/gpt-5.2openai/gpt-5.2-chat-latestopenai/gpt-5.4
Evaluated STT models
cartesia/ink-whisper(default)assemblyai/universal-streaming-multilingualdeepgram/nova-3
Parameters
Defaults are calibrated for typical outbound calls. Override them when you need different timing thresholds or a different classification prompt.
llmLLM | strLLM used for greeting classification. Accepts an LLM instance or a LiveKit Inference model ID string. If not set, AMD uses google/gemini-3.1-flash-lite via LiveKit Inference when available, and otherwise falls back to the session's own LLM. See recommended models for the evaluated set.
sttSTT | strSTT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise reuses the session's existing STT transcripts. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.
interrupt_on_machineboolDefault: TrueInterrupt any pending agent speech when a machine is detected.
participant_identitystrIdentity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.
ivr_detectionboolDefault: TrueAutomatically start IVR navigation when the result is machine-ivr. When False, AMD returns the machine-ivr result without starting navigation, and your agent decides how to handle it.
detection_optionsDetectionOptionsOverride the default timing thresholds and classification prompt. Pass a dict with any of the following keys: human_speech_threshold (default 2.5), human_silence_threshold (default 0.5), machine_silence_threshold (default 1.5), no_speech_threshold (default 10.0), timeout (default 20.0), or prompt. All thresholds are in seconds. Values not provided fall back to library defaults.
suppress_compatibility_warningboolDefault: FalseSilence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.
The Node.js SDK doesn't support IVR navigation, so treat machine-ivr results as a human conversation and let the main agent respond.
llmLLM | stringLLM used for greeting classification. Accepts an LLM instance or a LiveKit Inference model ID string. If not set, AMD uses google/gemini-3.1-flash-lite via LiveKit Inference when available, and otherwise falls back to the session's own LLM. See recommended models for the evaluated set.
sttSTT | stringSTT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise listens to session-level transcripts instead. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.
interruptOnMachinebooleanDefault: trueInterrupt any pending agent speech when a machine is detected.
participantIdentitystringIdentity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the session's linked participant or the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.
noSpeechTimeoutMsnumberDefault: 10000Maximum time in milliseconds to wait for a transcript before AMD gives up. When this elapses with no speech detected, AMD settles as machine-unavailable.
detectionTimeoutMsnumberDefault: 20000Maximum time in milliseconds for the entire detection. When this elapses, AMD settles with whatever evidence is available.
suppressCompatibilityWarningbooleanDefault: falseSilence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.
Additional resources
AMD example (Python)
Outbound voice agent that runs AMD before responding and branches on the classification result.
AMD example (Node.js)
Outbound voice agent that runs AMD before responding and branches on the classification result.
DTMF and IVR navigation
Send and receive DTMF tones, and navigate IVR systems after AMD detection.
Outbound calls
Create SIP participants and place outbound calls that AMD can classify.