Overview
Tools that take more than a few seconds block the conversation until they return. The agent stops talking, the user hears silence, and a regular tool can't send progress updates, be cancelled, or stop the LLM from calling the same tool twice.
Use async tools for anything that takes more than a few seconds, such as booking a flight, running a web search, or processing a document.
Async tool agent
Full example of a travel assistant that handles long-running tools with both progress updates and acoustic fillers.
Updating the user
Inside any @function_tool, RunContext provides two ways to send progress to the user. They're independent and you can use both in the same tool:
ctx.update(message)adds a status to the chat context. The LLM reads it, voices something natural to the user, and the conversation continues. Use this for information the LLM should know about, such as a partial result or a phase change.ctx.with_filler(source)plays audio directly throughsession.say(), bypassing the LLM. Use this for filler like "hang on a sec" or "still working on it" during work the LLM doesn't need to track.
Progress updates
Define a regular @function_tool on an Agent. Inside, call ctx.update(message) whenever you want to share progress, and return the final result when the tool is done:
from livekit.agents import Agent, RunContext, function_toolclass TravelAgent(Agent):def __init__(self):super().__init__(instructions="You are a travel assistant.")@function_tool()async def book_flight(self, ctx: RunContext, origin: str, destination: str, date: str) -> str:"""Book a flight for the user.Args:origin: Departure city or airport code.destination: Arrival city or airport code.date: Travel date (YYYY-MM-DD)."""await ctx.update(f"Searching flights from {origin} to {destination} on {date}.")# agent says: "Sure, let me look up flights from New York to Tokyo on April 15th."flights = await search_flights(origin, destination, date)await ctx.update(f"Found {len(flights)} options. Booking the best one now.")# agent says: "I found 3 options. Booking the best one for you now."booking = await confirm_booking(flights[0])return f"Booked! Confirmation number: {booking.id}"# agent says: "All set. Your booking confirmation number is FL-847293."
The agent waits for the first ctx.update() from each tool that calls it, so the user hears acknowledgement immediately. Tools that never call ctx.update() behave like regular synchronous tools. Later updates are added to the agent's chat context as they arrive, and the agent generates a new reply once it's idle.
Filler speech
ctx.with_filler() is an async context manager. Open it around a long-running operation, and the filler plays once the session has been continuously idle for delay seconds. Fillers only play during quiet pauses, so they don't talk over the user or pile up behind other agent speech.
from livekit.agents import Agent, RunContext, function_toolclass TravelAgent(Agent):def __init__(self):super().__init__(instructions="You are a travel assistant.")@function_tool()async def book_flight(self, ctx: RunContext, origin: str, destination: str, date: str) -> str:"""Book a flight."""# Plays "Still searching..." once the session has been idle for 5 seconds.async with ctx.with_filler("Still searching, hang on a sec.", delay=5):return await book_flight_api(origin, destination, date)
The following parameters are available on with_filler:
sourcestr | CallableThe filler to play. Pass a string for a fixed line, or a callable (step: int) -> SpeechHandle | str | None that receives the iteration count. A callable returning None skips that round and retries on the next interval. The step counter only advances when audio plays, so a series of None returns doesn't count against max_steps.
delayfloatDefault: 0Seconds of continuous session-idle required before each play. Use 0 to play as soon as the session is next idle.
intervalfloat | NoneDefault: NoneSeconds between plays. None plays at most once.
max_stepsint | NoneDefault: NoneMaximum number of times the filler plays. None means no limit.
Combining both
Most long-running tools use both channels: ctx.update() for key events (start, phase change, final result) and ctx.with_filler() for the gaps between them. The following example uses both channels in a single tool:
from livekit.agents import Agent, RunContext, function_toolclass TravelAgent(Agent):def __init__(self):super().__init__(instructions="You are a travel assistant.")@function_tool()async def book_flight(self, ctx: RunContext, origin: str, destination: str, date: str) -> str:"""Book a flight."""# One real update. The LLM voices a natural intro to the user.await ctx.update(f"Searching flights from {origin} to {destination} on {date}. ""This will take a couple of minutes.")# Phase 1: searching. Single acoustic filler if the user stays quiet for 5s.async with ctx.with_filler("Still searching, hang on a sec.", delay=5):flights = await search_flights(origin, destination, date)# Phase 2: confirming. Rotating fillers, up to 3 plays with 10s between them.followups = ["Almost there, just confirming.","Still working on it, won't be long.","Hang tight, almost done.",]async with ctx.with_filler(lambda step: followups[step], delay=5, interval=10, max_steps=len(followups)):booking = await confirm_booking(flights[0])# The final return is voiced as a follow-up reply when the agent is# next idle. No extra ctx.update() needed.return f"Booked! Confirmation number: {booking.id}"
The two channels stay separate. ctx.update() adds to the chat context (the LLM reads it on its next turn). ctx.with_filler() plays audio directly without going through the chat context. The LLM keeps full context for the events that matter, and the user keeps hearing the agent during long operations.
Cancellation
By default, async tools finish what they're doing regardless of what the user does. To let the LLM cancel a running tool, opt in with the CANCELLABLE flag:
from livekit.agents import RunContext, function_toolfrom livekit.agents.llm import ToolFlag@function_tool(flags=ToolFlag.CANCELLABLE)async def book_flight(ctx: RunContext, origin: str, destination: str, date: str) -> str:return "" # implementation
When any cancellable tool is registered, two companion tools are automatically exposed to the LLM:
get_running_tasks()returns the cancellable calls that are currently running.cancel_task(call_id)cancels one of them by ID, which raisesasyncio.CancelledErrorinside the tool.
Cancellation is opt-in because most tools (orders, writes, payments) aren't safe to interrupt partway through. Make sure cancellable tools can be safely stopped at any point.
If a cancellable tool calls ctx.disallow_interruptions(), calling cancel_task on it raises ToolError instead of cancelling the tool.
Duplicate-call handling
When the LLM calls a tool that's already running, the framework handles the duplicate based on the on_duplicate argument to @function_tool. Duplicates are detected by tool name only, not by arguments.
| Mode | Description |
|---|---|
allow | Default. Runs the duplicate without restriction. |
reject | Rejects the duplicate and tells the LLM to cancel via cancel_task instead. |
replace | Cancels the running call and starts a new one. Requires the running tool to opt into cancellation, otherwise the duplicate call raises a ToolError. |
confirm | Sends the name and arguments of the running call back to the LLM and asks it to re-call with explicit confirmation if a duplicate is needed. |
For example, to require LLM confirmation before a duplicate runs:
@function_tool(on_duplicate="confirm")async def book_flight(ctx: RunContext, origin: str, destination: str, date: str) -> str:return "" # implementation
Agent handoffs
By default, async tools belong to the Agent they're attached to. Tools placed on Agent(tools=...) (or bound as @function_tool methods on the agent class) belong to that agent, and any pending updates from them are dropped during an agent handoff.
To keep a tool running across handoffs, so its final result and any updates go to whichever agent is active when the tool finishes, bundle it into an AsyncToolset and pass that to the AgentSession:
from livekit.agents import AgentSession, RunContext, function_toolfrom livekit.agents.llm.async_toolset import AsyncToolset@function_tool()async def book_flight(ctx: RunContext, origin: str, destination: str, date: str) -> str:return "" # implementationsession = AgentSession(# ... stt, llm, tts, etc.tools=[AsyncToolset(id="booking", tools=[book_flight])],)
An AsyncToolset keeps its tools alive across handoffs, including any pending updates from tools that are still running. Plain @function_tools passed directly to AgentSession(tools=[...]) aren't carried across handoffs on their own. Only tools wrapped inside an AsyncToolset are.
Prompt templates
The framework sends the LLM a short instruction template around each async tool event: a ctx.update() call, a duplicate rejection, or a follow-up reply after a tool finishes. The defaults are tuned for natural agent responses, but you can override any of them by passing a tool_handling mapping with an async_options block.
from livekit.agents import AgentSessionsession = AgentSession(# ... stt, llm, tts, etc.tool_handling={"async_options": {"update_template": ("Background tool `{function_name}` reports: {message}. ""Acknowledge briefly. Don't summarize results that aren't in the message."),},},)
The available async_options keys are:
| Key | Sent to the LLM when |
|---|---|
update_template | A ctx.update(message) call is being delivered to the LLM. |
duplicate_reject_template | A duplicate call is blocked by on_duplicate="reject". |
duplicate_confirm_template | A duplicate call needs LLM confirmation under on_duplicate="confirm". |
reply_at_tail_template | A follow-up reply runs while the pending update is still the latest chat item. |
reply_maybe_covered_template | A follow-up reply runs after newer messages have arrived in the chat context. |
Unspecified keys fall back to defaults. Each value can be a str.format() template string or a callable. Both forms receive the same named variables for that template. Set tool_handling on an AsyncToolset, on an Agent, or on an AgentSession. The framework resolves templates from AsyncToolset first, then the Agent, then the AgentSession, falling back to defaults for any key you don't override.
Additional resources
For more information on concepts covered in this topic, see the following related topics: