Skip to main content

Tool loop design

Design tool loops that pick the right tool, return useful results, and stay fast under voice latency constraints.

Overview

LiveKit agents with tools run an iterative loop: the LLM reasons about what to do, calls a tool, observes the results, then either calls another tool or replies. The framework runs this loop automatically, so most developers don't think about it until something goes wrong: the agent picks the wrong tool, responds incorrectly, or takes too long. This guide covers how to design tool loops that pick the right tool, return useful results, and stay fast.

Focus the toolset

The model picks from your tool list on every turn and the list is part of every prompt. Long lists increase token usage, slow the loop, and can lead to incorrect tool selection. Aim for 5 to 10 tools per agent. Beyond 10 tools, incorrect selections become more common, and past 20, the model often struggles to choose reliably.

Use the following approaches to keep the set focused:

  • Consolidate overlapping tools. Two tools that share a purpose perform better as one tool with a parameter. search_customer(query, kind: 'name' | 'id') beats search_customer_by_name plus search_customer_by_id. One clear tool with a parameter is easier to pick than two with similar names.
  • Expose actions, not endpoints. API endpoints expose CRUD; tools should expose capabilities. Prefer search_contact(query) over list_contacts(). The first returns the one record the agent wants. The second hands it 50 entries to reason through as text. Filter, sort, and format in code. Hand the agent only what it needs.
  • Namespace tools by service, then resource. When several tools cover the same domain, prefix them by service (linear_search, asana_search) and then by resource (linear_issues_search, linear_teams_search). Consistent prefixes signal scope to the model, making it easier to pick the right tool from a long list.
  • Filter MCP server tools. A single MCP server can add 30 or more tools at once. Use server-level filtering or toolset-level filtering to narrow what's exposed to the agent on any given turn.
  • Dynamically register tools. For very large or rarely used capability sets, register tools at runtime instead of statically. See Adding tools dynamically and Toolsets.
  • Split across agents past the limit. Past ~10 tools per agent, divide responsibility across agents using handoffs or the supervisor pattern instead of overloading one agent.

Design tools for the model

Every tool you expose is text the model reads: the description as it picks the tool, the parameter list as it fills in arguments, and the return as it composes its next reply. The model can't see the implementation. Treat what it can see (descriptions, parameter names, return formats) the way you'd treat onboarding docs for a new hire: state what the tool does, when to call it, what to pass, and what to expect back.

Write descriptions the model can act on

A good tool description states what the tool does, when to call it, and when not to call it. Be explicit about constraints, provide examples, and call out boundaries with related tools.

# Avoid: vague description, no parameter guidance, no return contract.
@function_tool
async def get_data(date: str) -> str:
"""Gets data."""
...
# Prefer: specific trigger, parameter formats, return shape, exclusion rule.
@function_tool
async def check_availability(date: str, party_size: int) -> str:
"""Check open reservation slots for a given date.
Call this when the user asks about reservations. Don't call it
until you have both the date and the party size.
Args:
date: Reservation date in YYYY-MM-DD format.
party_size: Number of guests, between 1 and 12.
Returns:
A speech-ready summary of available times.
"""
...
// Avoid: vague description, no parameter guidance, no return contract.
checkAvailability: llm.tool({
description: 'Gets data.',
parameters: z.object({ date: z.string() }),
execute: async ({ date }) => { /* ... */ },
}),
// Prefer: specific trigger, parameter formats, return shape, exclusion rule.
checkAvailability: llm.tool({
description:
"Check open reservation slots for a given date. Call this when the user " +
"asks about reservations. Don't call it until you have both the date " +
"and the party size. Returns a speech-ready summary of available times.",
parameters: z.object({
date: z.string().describe('Reservation date in YYYY-MM-DD format.'),
partySize: z.number().int().min(1).max(12)
.describe('Number of guests, between 1 and 12.'),
}),
execute: async ({ date, partySize }) => {
// Return a speech-ready summary.
},
}),

Pin down parameter values

State valid values in the description, not just the type. Even when the type is enforceable (Literal["lunch", "dinner"], z.enum(['lunch', 'dinner'])), the model reads the description more carefully than the schema, so spelling out Meal must be "lunch" or "dinner". prevents a bad call instead of relying on the framework to reject one. For formats a type can't express (date strings, currency codes, casing rules), prose is the only signal, so give the format and an example.

Shape returns for the agent, not the API

Tools return data to the LLM, not to a frontend. In voice especially, the model often incorporates the return value into its spoken response with minimal rewording, so format for speech rather than for screens. Return values the model can incorporate into its next response without a second formatting pass:

  • Return speech-ready strings, not raw payloads. Prefer "3 slots available at 1 PM, 2:30 PM, and 4 PM" over [{time: "13:00"}, ...]. A raw payload forces extra reasoning before the agent can speak.
  • Return semantic identifiers, not opaque ones. LLMs are trained on human language, so they reason far better over "Reservation #R-1842 (Friday 7 PM, party of 3)" than "550e8400-e29b-41d4-a716-446655440000". Use names, dates, and short codes the agent can repeat back to a user.
  • Return only high-value information. A tool isn't an API endpoint, so don't return the full record just because you can. If the agent needs the party size and time, return those, not the full reservation.
  • Offer a verbosity parameter when both lengths are useful. A response_format: "concise" | "detailed" parameter lets the agent ask for what it needs. For a reservation lookup, concise returns "Friday at 7:45 PM, party of 3" (speech-ready); detailed returns party size, time, special requests, and notes (useful when the model needs to reason about whether to suggest changes). See the Anthropic Writing tools for agents  guide for the pattern.
  • Keep return values small. Paginate, summarize, or hand the result ID to a follow-up tool when the data is large. Long returns bloat the context window, slow inference, and degrade reasoning quality.

Control the loop from code

The model is non-deterministic. Push correctness into your code rather than the prompt.

Bound the loop

Set a hard limit on consecutive tool calls per LLM turn with max_tool_steps on AgentSession. The default is 3. When the loop hits this limit, the framework makes one final LLM call with tool use disabled and the agent replies with whatever context it has. Increase the limit for agents that legitimately chain calls (for example, a lookup followed by an action). Decrease it for agents whose tools should rarely fire more than once per turn.

session = AgentSession(
stt=...,
llm=...,
tts=...,
max_tool_steps=5, # default is 3
)
const session = new voice.AgentSession({
stt: ...,
llm: ...,
tts: ...,
voiceOptions: {
maxToolSteps: 5, // default is 3
},
});

Most LLM providers also let the model propose multiple tool calls in a single step that run concurrently. Concurrent calls are faster when independent (looking up two records, for example). When one tool's result must feed the next, disable parallelism so the loop runs serially:

from livekit.agents import inference
llm = inference.LLM(
model="openai/gpt-4.1-mini",
extra_kwargs={"parallel_tool_calls": False},
)
import { openai } from '@livekit/agents-plugin-openai';
const llm = new openai.LLM({
model: 'gpt-4.1-mini',
parallelToolCalls: false,
});

Raise actionable errors

A tool error message becomes part of the next prompt the model sees, and in voice the agent often reads it back to the user. A bare exception becomes "the tool failed" in the model's view, which leads to retries or apologies that aren't grounded in the actual problem. Raise ToolError with a reason the model can communicate or recover from.

from livekit.agents.llm import ToolError
@function_tool
async def lookup_reservation(self, confirmation: str) -> str:
"""Look up a reservation by its confirmation code."""
reservation = await reservations_api.get(confirmation)
if reservation is None:
raise ToolError(
"No reservation matches that confirmation code. "
"Ask the user to double-check the code or look in their email."
)
return f"Reservation #{reservation.id} for {reservation.party_size} on {reservation.date} at {reservation.time}."
import { llm } from '@livekit/agents';
lookupReservation: llm.tool({
description: 'Look up a reservation by its confirmation code.',
parameters: z.object({ confirmation: z.string() }),
execute: async ({ confirmation }) => {
const reservation = await reservationsApi.get(confirmation);
if (!reservation) {
throw new llm.ToolError(
'No reservation matches that confirmation code. ' +
'Ask the user to double-check the code or look in their email.',
);
}
return `Reservation #${reservation.id} for ${reservation.partySize} on ${reservation.date} at ${reservation.time}.`;
},
}),

Gate critical actions

When a turn must end with a specific action, such as confirming a booking, completing a task, or recording consent, don't trust the model to fire the right tool at the right time. Track state in code and require the model to confirm before the action runs.

A self-reporting parameter makes the model's intent visible so your code can enforce it. For a booking confirmation, the agent should always read the details back to the user before the reservation is written:

@function_tool
async def confirm_reservation(
self,
date: str,
time: str,
party_size: int,
read_back: bool,
) -> str:
"""Book the reservation. Only call this after reading the details back to the user.
Args:
date: Reservation date in YYYY-MM-DD format.
time: Reservation time in 24-hour HH:MM format.
party_size: Number of guests, between 1 and 12.
read_back: Set to True only after you have read the date, time,
and party size back to the user and they have confirmed.
"""
if not read_back:
return "Read the date, time, and party size back to the user first, then call this tool again."
booking = await reservations_api.book(date, time, party_size)
return f"Booked. Confirmation code is {booking.confirmation}."
confirmReservation: llm.tool({
description:
'Book the reservation. Only call this after reading the details back to the user.',
parameters: z.object({
date: z.string().describe('Reservation date in YYYY-MM-DD format.'),
time: z.string().describe('Reservation time in 24-hour HH:MM format.'),
partySize: z.number().int().min(1).max(12)
.describe('Number of guests, between 1 and 12.'),
readBack: z.boolean().describe(
'Set to true only after you have read the date, time, and party size ' +
'back to the user and they have confirmed.',
),
}),
execute: async ({ date, time, partySize, readBack }) => {
if (!readBack) {
return 'Read the date, time, and party size back to the user first, then call this tool again.';
}
const booking = await reservationsApi.book(date, time, partySize);
return `Booked. Confirmation code is ${booking.confirmation}.`;
},
}),

This pattern prevents the agent from writing the reservation before the user has actually confirmed the details.

Disable interruptions on writes

By default, user speech can interrupt a running tool. For read-only tools that's fine. For tools that write data (placing an order, sending a message, charging a card), an interruption can leave the operation half-done. Call run_ctx.disallow_interruptions() (Python) or set ctx.speechHandle.allowInterruptions = false (Node.js) at the start of any mutating tool. See Interruptions for the full API.

Manage loop latency

Even an optimized loop takes time, and in voice that time results in silence on the user's end. The techniques below help you mask dead air, bound how slow a tool can be, and skip a handoff you don't need.

Speak during long tool calls

If a tool can take more than a second, start speaking before it finishes. Use session.say() inside the tool to play a short, pre-determined filler line.

@function_tool
async def find_alternative_times(
self,
run_ctx: RunContext,
date: str,
party_size: int,
) -> str:
"""Find available reservation times on a given date when the user's first choice is full."""
run_ctx.session.say("Let me see what else is open.")
result = await reservations_api.search_times(date, party_size)
return result.speech_summary
findAlternativeTimes: llm.tool({
description:
"Find available reservation times on a given date when the user's first choice is full.",
parameters: z.object({
date: z.string().describe('Reservation date in YYYY-MM-DD format.'),
partySize: z.number().int().min(1).max(12),
}),
execute: async ({ date, partySize }, { ctx }) => {
ctx.session.say('Let me see what else is open.');
const result = await reservationsApi.searchTimes(date, partySize);
return result.speechSummary;
},
}),

For repeat-use fillers, pre-render the audio. See Cached TTS in tools.

Bound tools with a timeout

Tools that call external systems should have a timeout. A backend that hangs blocks the session: the close callback doesn't run and the next turn never starts.

import asyncio
from livekit.agents.llm import ToolError
@function_tool
async def lookup_reservation_status(self, confirmation: str) -> str:
"""Look up the status of a reservation by its confirmation code."""
try:
async with asyncio.timeout(5):
return await reservations_api.get_status(confirmation)
except asyncio.TimeoutError as err:
raise ToolError(
"Reservation lookup is slow right now. Ask the user to try again in a moment."
) from err
lookupReservationStatus: llm.tool({
description: 'Look up the status of a reservation by its confirmation code.',
parameters: z.object({ confirmation: z.string() }),
execute: async ({ confirmation }) => {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5000);
try {
return await reservationsApi.getStatus(confirmation, { signal: controller.signal });
} catch (err) {
throw new llm.ToolError(
'Reservation lookup is slow right now. Ask the user to try again in a moment.',
);
} finally {
clearTimeout(timeoutId);
}
},
}),

Update in place instead of handing off

A full agent handoff adds a reasoning step before the new agent speaks. When the only thing changing is the prompt or available tools, mutate the current agent instead:

  • Update instructions: agent.update_instructions(new_text). Python only.
  • Update tools: agent.update_tools(new_tool_list) in Python, agent.updateTools(newTools) in Node.js. See Adding tools dynamically for the full API.

For example, after a reservation agent verifies a caller's account, it can grant access to authenticated tools without handing off:

@function_tool
async def verify_user(self, email: str) -> str:
"""Verify the user so they can manage their account."""
user = await users_api.lookup(email)
if user is None:
return "No account found with that email."
await self.update_tools(
self.tools + [lookup_my_reservations, cancel_my_reservation]
)
await self.update_instructions(
self.instructions + f" The user is verified as {user.name}."
)
return f"Got it, {user.name}. How can I help with your account?"
verifyUser: llm.tool({
description: 'Verify the user so they can manage their account.',
parameters: z.object({ email: z.string() }),
execute: async ({ email }, { ctx }) => {
const user = await usersApi.lookup(email);
if (!user) return 'No account found with that email.';
const agent = ctx.session.currentAgent;
await agent.updateTools({
...agent.toolCtx,
lookupMyReservations,
cancelMyReservation,
});
return `Got it, ${user.name}. How can I help with your account?`;
},
}),

Use agent handoffs when the conversational role changes, not for configuration-only changes.

Example: a reservation agent

The agent below showcases the principles from this guide. It exposes three tools (check_availability, find_alternatives, and book_reservation) and uses them to take a restaurant reservation end to end.

import asyncio
from typing import Literal
from livekit.agents import Agent, AgentSession, RunContext, function_tool
from livekit.agents.llm import ToolError
class ReservationAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions=(
"You take restaurant reservations. Confirm the date, time, "
"and party size with the user before booking. Always read "
"the details back before calling book_reservation."
),
)
@function_tool
async def check_availability(
self,
date: str,
party_size: int,
meal: Literal["lunch", "dinner"],
) -> str:
"""Check open reservation slots for a given date and meal.
Call this when the user asks about availability. Don't call it
until you have the date, party size, and meal (must be "lunch"
or "dinner").
Args:
date: Reservation date in YYYY-MM-DD format.
party_size: Number of guests, between 1 and 12.
meal: Either "lunch" or "dinner".
Returns:
A speech-ready summary of available times.
"""
try:
async with asyncio.timeout(5):
slots = await reservations_api.check(date, party_size, meal)
except asyncio.TimeoutError as err:
raise ToolError(
"Availability lookup is slow. Ask the user to try again in a moment."
) from err
if not slots:
return f"No {meal} availability on {date} for a party of {party_size}."
return f"{len(slots)} {meal} slots open on {date}: {', '.join(slots)}."
@function_tool
async def find_alternatives(
self,
run_ctx: RunContext,
date: str,
party_size: int,
) -> str:
"""Find nearby dates with availability when the requested date is full."""
run_ctx.session.say("Let me check what else is open.")
try:
async with asyncio.timeout(8):
result = await reservations_api.search_nearby(date, party_size)
except asyncio.TimeoutError as err:
raise ToolError(
"Alternative search is slow. Ask the user to try again."
) from err
return result.speech_summary
@function_tool
async def book_reservation(
self,
run_ctx: RunContext,
date: str,
time: str,
party_size: int,
read_back: bool,
) -> str:
"""Book a reservation. Only call after reading the details back to the user.
Args:
date: Reservation date in YYYY-MM-DD format.
time: Reservation time in 24-hour HH:MM format.
party_size: Number of guests, between 1 and 12.
read_back: Set to True only after you have read the date, time,
and party size back to the user and they have confirmed.
"""
if not read_back:
return (
"Read the date, time, and party size back to the user first, "
"then call this tool again."
)
run_ctx.disallow_interruptions()
booking = await reservations_api.book(date, time, party_size)
return (
f"Booked for {party_size} on {date} at {time}. "
f"Confirmation #{booking.code}."
)
session = AgentSession(
stt=...,
llm=...,
tts=...,
max_tool_steps=5,
)
import { llm, voice } from '@livekit/agents';
import { z } from 'zod';
class ReservationAgent extends voice.Agent {
constructor() {
super({
instructions:
'You take restaurant reservations. Confirm the date, time, and ' +
'party size with the user before booking. Always read the details ' +
'back before calling bookReservation.',
tools: {
checkAvailability: llm.tool({
description:
'Check open reservation slots for a given date and meal. Call ' +
"this when the user asks about availability. Don't call it until " +
'you have the date, party size, and meal (must be "lunch" or "dinner"). ' +
'Returns a speech-ready summary of available times.',
parameters: z.object({
date: z.string().describe('Reservation date in YYYY-MM-DD format.'),
partySize: z.number().int().min(1).max(12)
.describe('Number of guests, between 1 and 12.'),
meal: z.enum(['lunch', 'dinner'])
.describe('Either "lunch" or "dinner".'),
}),
execute: async ({ date, partySize, meal }) => {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5000);
try {
const slots = await reservationsApi.check(date, partySize, meal, {
signal: controller.signal,
});
if (slots.length === 0) {
return `No ${meal} availability on ${date} for a party of ${partySize}.`;
}
return `${slots.length} ${meal} slots open on ${date}: ${slots.join(', ')}.`;
} catch {
throw new llm.ToolError(
'Availability lookup is slow. Ask the user to try again in a moment.',
);
} finally {
clearTimeout(timeoutId);
}
},
}),
findAlternatives: llm.tool({
description:
'Find nearby dates with availability when the requested date is full.',
parameters: z.object({
date: z.string().describe('Reservation date in YYYY-MM-DD format.'),
partySize: z.number().int().min(1).max(12),
}),
execute: async ({ date, partySize }, { ctx }) => {
ctx.session.say('Let me check what else is open.');
const result = await reservationsApi.searchNearby(date, partySize);
return result.speechSummary;
},
}),
bookReservation: llm.tool({
description:
'Book a reservation. Only call after reading the details back to the user.',
parameters: z.object({
date: z.string().describe('Reservation date in YYYY-MM-DD format.'),
time: z.string().describe('Reservation time in 24-hour HH:MM format.'),
partySize: z.number().int().min(1).max(12)
.describe('Number of guests, between 1 and 12.'),
readBack: z.boolean().describe(
'Set to true only after you have read the date, time, and ' +
'party size back to the user and they have confirmed.',
),
}),
execute: async ({ date, time, partySize, readBack }, { ctx }) => {
if (!readBack) {
return 'Read the date, time, and party size back to the user first, then call this tool again.';
}
ctx.speechHandle.allowInterruptions = false;
const booking = await reservationsApi.book(date, time, partySize);
return `Booked for ${partySize} on ${date} at ${time}. Confirmation #${booking.code}.`;
},
}),
},
});
}
}
const session = new voice.AgentSession({
stt: ...,
llm: ...,
tts: ...,
voiceOptions: {
maxToolSteps: 5,
},
});

The agent integrates each principle from this guide:

  • Focused tool set. Three tools, well under the 5-10 target.
  • Pinned parameter values. meal is constrained in both the type (Literal/z.enum) and the docstring. The schema stops the type checker from accepting an invalid value; the prose stops the model from inventing one. See Design tools for the model.
  • Bounded external calls. Each API call has an asyncio.timeout and a ToolError with a recovery message the agent can pass to the user. See Control the loop from code.
  • Confirmation gate. book_reservation takes a read_back parameter the model must set to True only after speaking the details aloud. If read_back is false, the tool returns a reminder. See Control the loop from code.
  • Interruption block on writes. disallow_interruptions() guards the booking call so a barge-in can't leave a half-finished write. See Control the loop from code.
  • Masked latency. find_alternatives uses session.say() so the user hears something while the search runs. See Manage loop latency.
  • Step limit. The session sets max_tool_steps=5, capping how many tool calls can chain per turn. See Control the loop from code.

Test and debug

Run the agent to verify your tools work well. You can surface issues by running evaluations and reviewing sessions:

  • Run evaluations: write a small set of input-output pairs that capture the tool calls you expect. For example, "Do you have a table for 3 on Friday?" should produce a check_availability call with party_size=3. Use real, varied data rather than a synthetic happy path. Run the set on every change. See Testing and evaluation.
  • Watch real sessions: use the Agents Console during development and Agent Observability in production. Look for turns where the model called a tool with the wrong arguments, chained more tool calls than the typical depth, or read a hold message after the tool had already returned.

When you find a failure, the kind of failure usually tells you what to fix:

  • Redundant tool calls typically mean return values include too little or too much data. This forces the agent to call the tool again because it didn't get what it needed the first time. Simplify the returned data or split the tool so each call delivers something concrete.
  • Invalid arguments typically mean the parameter description isn't clear enough and forces the agent to guess. Spell out the format and enumerate valid values in the description, not just the type.
  • Wrong tool selection typically means descriptions overlap or boundaries aren't explicit. Tighten the "when to call" and "when not to call" lines.

Additional resources

These resources cover what the loop can call, how multiple agents compose, and how to validate the result. For broader context on the pattern this guide builds on, see the LiveKit blog post on the ReAct pattern in voice agents .