Skip to main content

Async tools

Handle long-running tools so agents can keep talking.

ONLY Available inPython

Overview

Tools that take more than a few seconds block the conversation until they return. The agent stops talking, the user hears silence, and a regular tool can't send progress updates, be cancelled, or stop the LLM from calling the same tool twice.

Use async tools for anything that takes more than a few seconds, such as booking a flight, running a web search, or processing a document.

Async tool agent

Full example of a travel assistant that handles long-running tools with both progress updates and acoustic fillers.

Updating the user

Inside any @function_tool, RunContext provides two ways to send progress to the user. They're independent and you can use both in the same tool:

  • ctx.update(message) adds a status to the chat context. The LLM reads it, voices something natural to the user, and the conversation continues. Use this for information the LLM should know about, such as a partial result or a phase change.
  • ctx.with_filler(source) plays audio directly through session.say(), bypassing the LLM. Use this for filler like "hang on a sec" or "still working on it" during work the LLM doesn't need to track.

Progress updates

Define a regular @function_tool on an Agent. Inside, call ctx.update(message) whenever you want to share progress, and return the final result when the tool is done:

from livekit.agents import Agent, RunContext, function_tool
class TravelAgent(Agent):
def __init__(self):
super().__init__(instructions="You are a travel assistant.")
@function_tool()
async def book_flight(
self, ctx: RunContext, origin: str, destination: str, date: str
) -> str:
"""Book a flight for the user.
Args:
origin: Departure city or airport code.
destination: Arrival city or airport code.
date: Travel date (YYYY-MM-DD).
"""
await ctx.update(f"Searching flights from {origin} to {destination} on {date}.")
# agent says: "Sure, let me look up flights from New York to Tokyo on April 15th."
flights = await search_flights(origin, destination, date)
await ctx.update(f"Found {len(flights)} options. Booking the best one now.")
# agent says: "I found 3 options. Booking the best one for you now."
booking = await confirm_booking(flights[0])
return f"Booked! Confirmation number: {booking.id}"
# agent says: "All set. Your booking confirmation number is FL-847293."

The agent waits for the first ctx.update() from each tool that calls it, so the user hears acknowledgement immediately. Tools that never call ctx.update() behave like regular synchronous tools. Later updates are added to the agent's chat context as they arrive, and the agent generates a new reply once it's idle.

Filler speech

ctx.with_filler() is an async context manager. Open it around a long-running operation, and the filler plays once the session has been continuously idle for delay seconds. Fillers only play during quiet pauses, so they don't talk over the user or pile up behind other agent speech.

from livekit.agents import Agent, RunContext, function_tool
class TravelAgent(Agent):
def __init__(self):
super().__init__(instructions="You are a travel assistant.")
@function_tool()
async def book_flight(
self, ctx: RunContext, origin: str, destination: str, date: str
) -> str:
"""Book a flight."""
# Plays "Still searching..." once the session has been idle for 5 seconds.
async with ctx.with_filler("Still searching, hang on a sec.", delay=5):
return await book_flight_api(origin, destination, date)

The following parameters are available on with_filler:

source
Required
str | Callable

The filler to play. Pass a string for a fixed line, or a callable (step: int) -> SpeechHandle | str | None that receives the iteration count. A callable returning None skips that round and retries on the next interval. The step counter only advances when audio plays, so a series of None returns doesn't count against max_steps.

delayfloatDefault: 0

Seconds of continuous session-idle required before each play. Use 0 to play as soon as the session is next idle.

intervalfloat | NoneDefault: None

Seconds between plays. None plays at most once.

max_stepsint | NoneDefault: None

Maximum number of times the filler plays. None means no limit.

Combining both

Most long-running tools use both channels: ctx.update() for key events (start, phase change, final result) and ctx.with_filler() for the gaps between them. The following example uses both channels in a single tool:

from livekit.agents import Agent, RunContext, function_tool
class TravelAgent(Agent):
def __init__(self):
super().__init__(instructions="You are a travel assistant.")
@function_tool()
async def book_flight(
self, ctx: RunContext, origin: str, destination: str, date: str
) -> str:
"""Book a flight."""
# One real update. The LLM voices a natural intro to the user.
await ctx.update(
f"Searching flights from {origin} to {destination} on {date}. "
"This will take a couple of minutes."
)
# Phase 1: searching. Single acoustic filler if the user stays quiet for 5s.
async with ctx.with_filler("Still searching, hang on a sec.", delay=5):
flights = await search_flights(origin, destination, date)
# Phase 2: confirming. Rotating fillers, up to 3 plays with 10s between them.
followups = [
"Almost there, just confirming.",
"Still working on it, won't be long.",
"Hang tight, almost done.",
]
async with ctx.with_filler(
lambda step: followups[step], delay=5, interval=10, max_steps=len(followups)
):
booking = await confirm_booking(flights[0])
# The final return is voiced as a follow-up reply when the agent is
# next idle. No extra ctx.update() needed.
return f"Booked! Confirmation number: {booking.id}"

The two channels stay separate. ctx.update() adds to the chat context (the LLM reads it on its next turn). ctx.with_filler() plays audio directly without going through the chat context. The LLM keeps full context for the events that matter, and the user keeps hearing the agent during long operations.

Cancellation

By default, async tools finish what they're doing regardless of what the user does. To let the LLM cancel a running tool, opt in with the CANCELLABLE flag:

from livekit.agents import RunContext, function_tool
from livekit.agents.llm import ToolFlag
@function_tool(flags=ToolFlag.CANCELLABLE)
async def book_flight(ctx: RunContext, origin: str, destination: str, date: str) -> str:
return "" # implementation

When any cancellable tool is registered, two companion tools are automatically exposed to the LLM:

  • get_running_tasks() returns the cancellable calls that are currently running.
  • cancel_task(call_id) cancels one of them by ID, which raises asyncio.CancelledError inside the tool.

Cancellation is opt-in because most tools (orders, writes, payments) aren't safe to interrupt partway through. Make sure cancellable tools can be safely stopped at any point.

If a cancellable tool calls ctx.disallow_interruptions(), calling cancel_task on it raises ToolError instead of cancelling the tool.

Duplicate-call handling

When the LLM calls a tool that's already running, the framework handles the duplicate based on the on_duplicate argument to @function_tool. Duplicates are detected by tool name only, not by arguments.

ModeDescription
allowDefault. Runs the duplicate without restriction.
rejectRejects the duplicate and tells the LLM to cancel via cancel_task instead.
replaceCancels the running call and starts a new one. Requires the running tool to opt into cancellation, otherwise the duplicate call raises a ToolError.
confirmSends the name and arguments of the running call back to the LLM and asks it to re-call with explicit confirmation if a duplicate is needed.

For example, to require LLM confirmation before a duplicate runs:

@function_tool(on_duplicate="confirm")
async def book_flight(ctx: RunContext, origin: str, destination: str, date: str) -> str:
return "" # implementation

Agent handoffs

By default, async tools belong to the Agent they're attached to. Tools placed on Agent(tools=...) (or bound as @function_tool methods on the agent class) belong to that agent, and any pending updates from them are dropped during an agent handoff.

To keep a tool running across handoffs, so its final result and any updates go to whichever agent is active when the tool finishes, bundle it into an AsyncToolset and pass that to the AgentSession:

from livekit.agents import AgentSession, RunContext, function_tool
from livekit.agents.llm.async_toolset import AsyncToolset
@function_tool()
async def book_flight(ctx: RunContext, origin: str, destination: str, date: str) -> str:
return "" # implementation
session = AgentSession(
# ... stt, llm, tts, etc.
tools=[AsyncToolset(id="booking", tools=[book_flight])],
)

An AsyncToolset keeps its tools alive across handoffs, including any pending updates from tools that are still running. Plain @function_tools passed directly to AgentSession(tools=[...]) aren't carried across handoffs on their own. Only tools wrapped inside an AsyncToolset are.

Prompt templates

The framework sends the LLM a short instruction template around each async tool event: a ctx.update() call, a duplicate rejection, or a follow-up reply after a tool finishes. The defaults are tuned for natural agent responses, but you can override any of them by passing a tool_handling mapping with an async_options block.

from livekit.agents import AgentSession
session = AgentSession(
# ... stt, llm, tts, etc.
tool_handling={
"async_options": {
"update_template": (
"Background tool `{function_name}` reports: {message}. "
"Acknowledge briefly. Don't summarize results that aren't in the message."
),
},
},
)

The available async_options keys are:

KeySent to the LLM when
update_templateA ctx.update(message) call is being delivered to the LLM.
duplicate_reject_templateA duplicate call is blocked by on_duplicate="reject".
duplicate_confirm_templateA duplicate call needs LLM confirmation under on_duplicate="confirm".
reply_at_tail_templateA follow-up reply runs while the pending update is still the latest chat item.
reply_maybe_covered_templateA follow-up reply runs after newer messages have arrived in the chat context.

Unspecified keys fall back to defaults. Each value can be a str.format() template string or a callable. Both forms receive the same named variables for that template. Set tool_handling on an AsyncToolset, on an Agent, or on an AgentSession. The framework resolves templates from AsyncToolset first, then the Agent, then the AgentSession, falling back to defaults for any key you don't override.

Additional resources

For more information on concepts covered in this topic, see the following related topics: