Overview
The LiveKit Agents framework lets you build sophisticated voice AI apps with multiple personas, conversation phases, or specialized capabilities using agents, handoffs, and tasks.
Core constructs
An agent session is the main orchestrator of your voice AI app and can be composed of one or more agents. Agents are one of the core building blocks of a workflow that also includes tasks and tools. Each plays a distinct role in creating a flexible, maintainable system:
Agents hold long-lived control of a session. They define instructions, reasoning behavior, and tools, and can transfer control to another agent when different rules or capabilities are required.
Tools are user-defined functions callable by the model. They allow the agent to perform actions beyond generative text, such as reading from or writing to external systems. Tool invocations are model-driven: the LLM chooses to call them based on context, and the returned results are fed back to the model for continued reasoning. Tools can also trigger agent handoffs.
Tasks are short-lived units of work that run to completion and return a typed result. Unlike agents, tasks do not persist; they take temporary control only while executing. Tasks can include tool definitions used to complete their objectives.
Task groups run sequences of tasks for multi-step operations. They allow users to revisit earlier steps if corrections are needed, and all tasks in a group share conversation context. The summarized result is returned to the controlling agent when the group finishes.
This architecture makes workflows explicit and predictable: agents manage ongoing conversational control, tasks encapsulate discrete operations, tools execute side effects and enable handoffs, and task groups coordinate ordered multi-step flows with regression support. Together, these constructs form a testable and maintainable execution model for non-trivial voice AI systems.
Choosing a pattern
Start with a single agent and a small set of tools. A single agent can handle multi-step flows by updating its instructions or changing available tools between conversation phases. For example, you might use one set of tools during booking lookup and another during confirmation.
Each pattern you layer on top of a single agent adds complexity, latency, or context management overhead. Split workflows only when you encounter a concrete limitation:
- Instruction bloat: The system prompt becomes large enough that the model starts to underperform on its primary task.
- Conflicting tool access: Different phases of the conversation require different tools or permissions.
- Multi-turn data collection: A workflow step requires its own LLM loop to gather and validate structured input over several conversational turns.
- Backtracking: Users need to revisit earlier steps to correct previously provided information.
When one of those signals is present, choose the simplest construct that addresses it:
| Pattern | Session control | Context | Latency cost | Correction handling | Best for |
|---|---|---|---|---|---|
| Single agent + tools | One agent stays in control for the entire session. | Full conversation context is retained. | None. | Manual. Re-prompt or re-ask. | Simple flows with few tools and no distinct conversation phases. |
| Supervisor pattern | One agent stays in control; tasks take temporary control and return a typed result. | Supervisor keeps full context. Tasks receive a scoped copy. | Minimal. Task runs within the same session. | Re-run the task from the supervisor. | One agent coordinates focused, reusable sub-operations such as data collection or verification. |
| Agent handoffs | Control transfers fully to a new agent. The original agent doesn't participate afterward. | Explicit: pass chat_ctx, summarize, or start fresh. | Handoff overhead per transition. | Manual. Hand off back to the previous agent. | Distinct roles, model specialization, or permission boundaries between conversation phases. |
| Task groups | TaskGroup orchestrates an ordered sequence of tasks. | Shared within the group. Summarized on completion. | Minimal. Sequential within the same session. | Built-in. Users can regress to earlier completed steps. | Ordered multi-step data collection where users might need to revisit earlier steps. |
These patterns aren't mutually exclusive. Different phases of a conversation can use different patterns. For example, an intake supervisor can hand off to a billing agent that uses its own supervisor pattern. For a deeper comparison, see When to use the supervisor pattern.
Example: booking flow
Consider an appointment agent that handles booking, rescheduling, and cancellation. All three intents share an initial lookup step. A single agent with tools works well here if the instructions stay concise and the tools don't conflict.
When the intents diverge, the supervisor pattern is a better fit. Booking might require multi-turn address collection, rescheduling needs calendar-specific tools, and cancellation requires a separate consent flow. One supervisor routes to focused specialist tasks and stays in control of the session, so it can handle mid-conversation intent changes if a user starts booking but decides to reschedule instead.
If each intent also requires a strict ordered sequence of sub-steps with backtracking, a task group within the supervisor is appropriate.
If the appointment flow ends with a payment step where the agent needs different instructions, tools, and access controls, an agent handoff is appropriate: the supervisor hands off to a dedicated billing agent once the appointment is confirmed.
Best practices
Before building your workflow, map out the conversation phases, identify where different personas or capabilities are needed, and determine which operations are short-lived versus continuous. The following guidelines help you choose the right pattern for each part of your workflow:
Create separate agents when you need distinct reasoning behavior or tool access.
Use tasks for discrete operations that must complete before continuing the conversation (for example, consent collection, data capture, or verification).
Expose external actions through tools with clear purpose and meaningful return values that contribute to reasoning.
Plan how conversation context is preserved or reset across agents. Some transitions require full continuity; others benefit from a clean slate.
Use a task group for ordered multi-step processes that might need to revisit earlier steps.
Build workflows incrementally. Add tests and evals to verify tool, task, and agent behavior.
Design for user experience: announce handoffs, preserve relevant context to avoid repetition, and handle correction paths cleanly.
Following these patterns keeps complex workflows predictable, testable, and extensible.
Additional resources
For more information on specific topics related to building voice AI workflows, see the following topics:
Supervisor pattern
Route work to specialist tasks while one agent stays in control.
Agents and handoffs
Define agents and agent handoffs to build multi-agent voice AI workflows.
Tasks & task groups
Use tasks and task groups to execute discrete operations and build complex workflows.
Prompting guide
Complete guide to writing good instructions for your agents.
Tool definition and use
Use tools to call external services, inject custom logic, agent handoffs, and more.
Testing & evaluation
Test every aspect of your agents with a custom test suite.
Agent-assisted warm transfer
Transfer calls to a human operator while providing a contextual summary.
Call forwarding (cold transfer)
Forward calls to another number or SIP endpoint using SIP REFER.