Overview
An outbound call can reach a person, voicemail, an IVR menu, or a number that can't accept messages. Answering machine detection (AMD) listens to the start of the call, classifies it with an LLM, and returns a result so your agent can respond appropriately.
How AMD works
AMD runs once at the start of the call, on the first user utterance. It doesn't monitor continuously. While AMD is running, the agent's speech is paused so it doesn't talk over a voicemail greeting before classification completes.
AMD classifies the call into one of five categories. Your agent uses the result to decide the next step: continue the conversation, leave a voicemail, navigate an IVR, or hang up.
AMD runs two paths in parallel: a fast-path heuristic for short greetings followed by silence, and an LLM classifier for transcripts that need more reasoning. The first path to reach a conclusion produces the result.
| Category | Description |
|---|---|
human | A real person answered. Proceed with normal conversation. |
machine-ivr | An IVR or DTMF menu was detected. In Python, the session automatically starts IVR navigation when ivr_detection is enabled (the default). The Node.js SDK doesn't support IVR navigation, so the agent should handle machine-ivr the same as human and let the main agent respond. |
machine-vm | A voicemail greeting where leaving a message is possible. |
machine-unavailable | The mailbox is full, not set up, or the callee is unreachable. Leaving a message isn't possible. |
uncertain | The greeting can't be classified with confidence, or no speech is detected at all (for example, the callee answers but stays silent). Treat as a human and proceed with normal conversation. |
Usage
Initialize AMD before creating the SIP participant so detection is ready before audio starts arriving. The detector pauses agent speech until a result is available.
Open the async context manager, then create the SIP participant inside it. Pass participant_identity so AMD's timers wait for that specific participant's audio track:
import osfrom livekit.agents import AMDfrom livekit.protocol.sip import SIPOutboundConfigasync with AMD(session, participant_identity=participant_identity) as detector:await ctx.api.sip.create_sip_participant(api.CreateSIPParticipantRequest(trunk=SIPOutboundConfig(hostname=os.getenv("SIP_TRUNK_HOSTNAME"),auth_username=os.getenv("SIP_AUTH_USERNAME"),auth_password=os.getenv("SIP_AUTH_PASSWORD"),),sip_number="<SIP provider number>",room_name=ctx.room.name,sip_call_to=phone_number,participant_identity=participant_identity,wait_until_answered=True,))await ctx.wait_for_participant(identity=participant_identity)result = await detector.execute()if result.category == "human" or result.category == "uncertain":logger.info("human answered the call or amd is uncertain, proceeding with normal conversation",extra={"transcript": result.transcript},)elif result.category == "machine-ivr":logger.info("ivr menu detected, starting navigation")elif result.category == "machine-vm":logger.info("voicemail detected, leaving a message")speech_handle = session.generate_reply(instructions=("You've reached voicemail. Leave a brief message asking ""the customer to call back."),)await speech_handle.wait_for_playout()ctx.shutdown("voicemail detected")elif result.category == "machine-unavailable":logger.info("mailbox unavailable, ending call")ctx.shutdown("mailbox unavailable")
Instantiate the detector before creating the SIP participant. Pass participantIdentity so AMD's timers wait for that participant's audio track. Wrap the run in try/finally so detector.aclose() runs even on error:
import { voice } from '@livekit/agents';import { SipClient } from 'livekit-server-sdk';session._roomIO.setParticipant(participantIdentity);const detector = new voice.AMD(session, { participantIdentity });try {const sip = new SipClient(process.env.LIVEKIT_URL,process.env.LIVEKIT_API_KEY,process.env.LIVEKIT_API_SECRET,);await sip.createSipParticipant('', // Empty string when using inline trunk configphoneNumber,ctx.room.name,{participantIdentity,fromNumber: '<SIP provider number>',waitUntilAnswered: true,},{ // Inline trunk configurationhostname: process.env.SIP_TRUNK_HOSTNAME,authUsername: process.env.SIP_AUTH_USERNAME,authPassword: process.env.SIP_AUTH_PASSWORD,},);await ctx.waitForParticipant(participantIdentity);const result = await detector.execute();if (result.category === voice.AMDCategory.HUMAN ||result.category === voice.AMDCategory.UNCERTAIN ||result.category === voice.AMDCategory.MACHINE_IVR) {logger.info({ amd: result },'human or ivr menu detected, proceeding with normal conversation',);} else if (result.category === voice.AMDCategory.MACHINE_VM) {logger.info({ amd: result }, 'voicemail detected, leaving a message');const speechHandle = session.generateReply({instructions:"You've reached voicemail. Leave a brief message asking the customer to call back.",});await speechHandle.waitForPlayout();session.shutdown({ reason: 'amd:machine-vm' });} else if (result.category === voice.AMDCategory.MACHINE_UNAVAILABLE) {logger.info({ amd: result }, 'mailbox unavailable, ending call');session.shutdown({ reason: 'amd:machine-unavailable' });}} finally {await detector.aclose();}
You can also use a stored outbound trunk by passing sip_trunk_id (Python) or sipTrunkId (Node.js) instead of inline trunk configuration. For details, see Outbound trunk.
Recommended models
AMD has been evaluated against a small set of LLMs and STT models on LiveKit Inference.
Behavior on unevaluated models isn't guaranteed, so AMD logs a compatibility warning when you pass an unevaluated model. Once you've validated your own choice, set suppress_compatibility_warning=True (Python) or suppressCompatibilityWarning: true (Node.js) to silence the warning.
Evaluated LLMs
google/gemini-3.1-flash-lite(default)google/gemini-3-flash-previewgoogle/gemini-2.5-flash-liteopenai/gpt-4oopenai/gpt-4.1openai/gpt-4.1-miniopenai/gpt-4.1-nanoopenai/gpt-5.1openai/gpt-5.1-chat-latestopenai/gpt-5.2openai/gpt-5.2-chat-latestopenai/gpt-5.4
Evaluated STT models
cartesia/ink-whisper(default)assemblyai/universal-streaming-multilingualdeepgram/nova-3
Parameters
Defaults are calibrated for typical outbound calls. Override them when you need different timing thresholds or a different classification prompt.
llmLLM | strLLM used for greeting classification. Accepts an LLM instance or a LiveKit Inference model ID string. If not set, AMD uses google/gemini-3.1-flash-lite via LiveKit Inference when available, and otherwise falls back to the session's own LLM. See recommended models for the evaluated set.
sttSTT | strSTT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise reuses the session's existing STT transcripts. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.
interrupt_on_machineboolDefault: TrueInterrupt any pending agent speech when a machine is detected.
participant_identitystrIdentity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.
ivr_detectionboolDefault: TrueAutomatically start IVR navigation when the result is machine-ivr. When False, AMD returns the machine-ivr result without starting navigation, and your agent decides how to handle it.
detection_optionsDetectionOptionsOverride the default timing thresholds and classification prompt. Pass a dict with any of the following keys: human_speech_threshold (default 2.5), human_silence_threshold (default 0.5), machine_silence_threshold (default 1.5), no_speech_threshold (default 10.0), timeout (default 20.0), or prompt. All thresholds are in seconds. Values not provided fall back to library defaults. The no_speech_threshold and timeout budgets start when the call is answered, not when AMD starts, so ringback and early media don't count against them.
wait_until_finishedboolDefault: FalseWhen True, AMD doesn't force a classification after the detection timeout if it detects speech. Instead, it waits for end-of-turn detection and post-speech silence before emitting a result. Use this for voicemail greetings, where responding before the greeting ends could interrupt playback. The no-speech timeout is unaffected and still triggers when no speech is detected.
suppress_compatibility_warningboolDefault: FalseSilence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.
The Node.js SDK doesn't support IVR navigation, so treat machine-ivr results as a human conversation and let the main agent respond.
llmLLM | stringLLM used for greeting classification. Accepts an LLM instance or a LiveKit Inference model ID string. If not set, AMD uses google/gemini-3.1-flash-lite via LiveKit Inference when available, and otherwise falls back to the session's own LLM. See recommended models for the evaluated set.
sttSTT | stringSTT used to transcribe the greeting. Accepts an STT instance or a LiveKit Inference model ID string. If not set, AMD uses cartesia/ink-whisper via LiveKit Inference when available, and otherwise listens to session-level transcripts instead. AMD runs its own STT pipeline so it can listen even when the session uses a realtime model with no separate STT.
interruptOnMachinebooleanDefault: trueInterrupt any pending agent speech when a machine is detected.
participantIdentitystringIdentity of the SIP participant whose audio AMD should listen to. When omitted, AMD attaches to the session's linked participant or the first remote audio track in the room. Set this when the room might have other participants so AMD timers don't start on the wrong track.
humanSpeechThresholdMsnumberDefault: 2500Maximum length in milliseconds of a "short greeting." Speech shorter than this triggers the fast-path human heuristic; speech longer is treated as machine-like and defers to the LLM classifier.
humanSilenceThresholdMsnumberDefault: 500Silence in milliseconds after a short greeting before AMD settles as human. Shorter values commit to human faster on quick "Hello?" greetings.
machineSilenceThresholdMsnumberDefault: 1500Silence in milliseconds after machine-like speech before AMD opens the silence gate and emits a verdict. Longer values give the LLM more time to review the transcript.
noSpeechTimeoutMsnumberDefault: 10000Maximum time in milliseconds to wait for a transcript before AMD gives up. When this elapses with no speech detected, AMD settles as uncertain. The timer starts when the call is answered, not when AMD starts, so ringback and early media don't count against it.
detectionTimeoutMsnumberDefault: 20000Maximum time in milliseconds for the entire detection. When this elapses, AMD settles with whatever evidence is available.
waitUntilFinishedbooleanDefault: falseWhen true and speech has been heard, detectionTimeoutMs no longer forces a verdict. AMD waits for post-speech silence and a confirmed end of turn before emitting. Use this for voicemail flows where replying before the greeting finishes would talk over it. noSpeechTimeoutMs still fires normally, since silence has nothing to wait for.
maxEndpointingDelayMsnumberFallback end-of-turn delay used when waitUntilFinished gates a verdict. If the turn detector never commits a turn, this backstop (armed when speech ends) marks the end of turn so the gated verdict can still emit. Defaults to the session's endpointing maxDelay, or 3000 when no session activity is available.
promptstringOverride the default classification prompt passed to the LLM. Use this to bias detection toward your domain (for example, recognizing region-specific voicemail phrasing) or to translate the prompt into another language.
suppressCompatibilityWarningbooleanDefault: falseSilence the warning that fires when llm or stt isn't among the evaluated models. Has no effect on classification behavior.
Additional resources
AMD example (Python)
Outbound voice agent that runs AMD before responding and branches on the classification result.
AMD example (Node.js)
Outbound voice agent that runs AMD before responding and branches on the classification result.
DTMF and IVR navigation
Send and receive DTMF tones, and navigate IVR systems after AMD detection.
Outbound calls
Create SIP participants and place outbound calls that AMD can classify.