Skip to main content

Test framework

Set up tests, navigate results, write assertions, and test multi-turn conversations.

Overview

This guide covers the full testing API for LiveKit Agents, including test setup, result navigation, assertions, mocking, and multi-turn conversation testing. The examples use pytest for Python and Vitest for Node.js, but are adaptable to other testing frameworks.

Project structure and deployment

When restructuring your project to add tests, ensure you update your Dockerfile too if you move your agent entrypoint file. The default template assumes src/agent.py for Python projects. See Builds and Dockerfiles for details.

Installation

You must install both the pytest and pytest-asyncio packages to write tests for your agent.

uv add pytest pytest-asyncio

You must install vitest to write tests for your agent.

pnpm add -D vitest
Suppress CLI output

Always call initializeLogger({ pretty: false, level: 'warn' }) at the top of your test files to suppress verbose CLI output.

Test setup

Each test typically follows the same pattern:

@pytest.mark.asyncio # Or your async testing framework of choice
async def test_your_agent() -> None:
async with (
# You must create an LLM instance for the `judge` method
inference.LLM(model="openai/gpt-5.3-chat-latest") as llm,
# Create a session for the life of this test.
# LLM is not required - it will use the agent's LLM if you don't provide one here
AgentSession(llm=llm) as session,
):
# Start the agent in the session
await session.start(Assistant())
# Run a single conversation turn based on the given user input
result = await session.run(user_input="Hello")
# ...your assertions go here...
import { inference, initializeLogger, voice } from '@livekit/agents';
import { describe, it, beforeAll, afterAll } from 'vitest';
// Import your agent class
import { Agent } from './agent';
// Initialize logger to suppress CLI output
initializeLogger({ pretty: false, level: 'warn' });
const { AgentSession } = voice;
describe('YourAgent', () => {
let session: voice.AgentSession;
let llm: inference.LLM;
beforeAll(async () => {
// You must create an LLM instance for the `judge` method
llm = new inference.LLM({ model: 'openai/gpt-5.3-chat-latest' });
// Create a session for the life of this test.
// LLM is not required - it will use the agent's LLM if you don't provide one here
session = new AgentSession({ llm });
// Start the agent in the session
await session.start({ agent: new Agent() });
});
afterAll(async () => {
await session?.close();
});
it('should test your agent', async () => {
// Run a single conversation turn based on the given user input
const result = await session.run({ userInput: 'Hello' }).wait();
// ...your assertions go here...
});
});

Result structure

The run method executes a single conversation turn and returns a RunResult, which contains each of the events that occurred during the turn, in order, and offers a fluent assertion API.

A simple turn with no tool calls produces a single event:

Loading diagram…

However, a more complex turn may contain tool calls, tool outputs, handoffs, and one or more messages.

Loading diagram…

To validate these multi-part turns, you can use any of the following approaches.

Sequential navigation

  • Step through events one at a time with next_event().
  • Validate each event with is_* assertions like is_message().
  • Call no_more_events() at the end to assert no unexpected events remain.

For example, to validate that the agent responds with a friendly greeting, you can use the following code:

result.expect.next_event().is_message(role="assistant")
result.expect.nextEvent().isMessage({ role: 'assistant' });

Skipping events

You can also skip events without validation:

  • skip_next(n): Skip one or more events. Defaults to 1.
  • skip_next_event_if(type, ...): Skip the next event only if it matches the given type and optional filters (for example, role for messages, name for function calls). Returns the matching Assert, or None if the next event doesn't match.
  • next_event(type=...): Advance to the next event of the given type, skipping everything else. Raises an assertion error if no match is found.

Example:

result.expect.skip_next() # skips one event
result.expect.skip_next(2) # skips two events
result.expect.skip_next_event_if(type="message", role="assistant") # Skips the next event if it's an assistant message
result.expect.skip_next_event_if(type="function_call", name="lookup_weather") # Skips the next event if it's a call to lookup_weather
result.expect.next_event(type="function_call") # Advances to the next function call, skipping non-function-call events. Raises an assertion error if not found.
result.expect.skipNext(); // skips one event
result.expect.skipNext(2); // skips two events
result.expect.skipNextEventIf({ type: 'message', role: 'assistant' }); // Skips the next event if it's an assistant message
result.expect.nextEvent({ type: 'message', role: 'assistant' }); // Advances to the next assistant message, skipping anything else. If no matching event is found, an assertion error is raised.
Return types for next_event(type=...)

Passing a type to next_event() returns a type-specific Assert (for example, FunctionCallAssert) that doesn't have is_* methods. Don't chain .is_function_call() after next_event(type="function_call").

To assert additional properties like function name, either omit type and chain the is_* method, or check the event directly:

# Option 1: chain is_function_call on a generic EventAssert
result.expect.next_event().is_function_call(name="lookup_weather")
# Option 2: advance to any function call, then check the name
fnc = result.expect.next_event(type="function_call")
assert fnc.event().item.name == "lookup_weather"

Indexed access

Access a specific event by index without advancing the cursor. You can use negative indices to access events from the end of the list. For example, -1 for the last event.

result.expect[0].is_message(role="assistant")
result.expect.at(0).isMessage({ role: 'assistant' });

Search for events regardless of position with contains_* methods like contains_message(). You can also search within a range using slices ([:] in Python, .range() in Node.js).

result.expect.contains_message(role="assistant")
result.expect[0:2].contains_message(role="assistant")
result.expect.containsMessage({ role: 'assistant' });
result.expect.range(0, 2).containsMessage({ role: 'assistant' });

Assertions

The test framework includes assertion helpers to validate messages, tool calls, and agent handoffs within each result. Use exact assertions like is_message() to check a specific event, or search assertions like contains_message() to find a match anywhere in a range of events.

Message assertions

Use is_message() and contains_message() to test individual messages. Both accept an optional role argument.

result.expect.next_event().is_message(role="assistant")
result.expect[0:2].contains_message(role="assistant")
result.expect.nextEvent().isMessage({ role: 'assistant' });
result.expect.range(0, 2).containsMessage({ role: 'assistant' });

Access additional properties with the event() method:

  • event().item.content - Message content
  • event().item.role - Message role

LLM-based judgment

Use judge() to evaluate whether a message matches a given intent. Pass an LLM instance and an intent string describing the expected content. The LLM judges the message against the intent without surrounding conversation context.

result = await session.run(user_input="Hello")
await (
result.expect.next_event().is_message(role="assistant")
.judge(
llm, intent="Offers a friendly introduction and offer of assistance."
)
)
const result = await session.run({ userInput: 'Hello' }).wait();
await result.expect
.nextEvent()
.isMessage({ role: 'assistant' })
.judge(llm, {
intent: 'Offers a friendly introduction and offer of assistance.',
});

The llm argument can be any LLM instance and does not need to be the same one used in the agent itself.

Tool call assertions

Test three aspects of tool use:

  1. Function calls: The agent calls the correct tool with the correct arguments.
  2. Function call outputs: The tool returns the expected output.
  3. Agent response: The agent responds appropriately based on the tool output.

The following example tests all three:

result = await session.run(user_input="What's the weather in Tokyo?")
# Test that the agent's first conversation item is a function call
fnc_call = result.expect.next_event().is_function_call(name="lookup_weather", arguments={"location": "Tokyo"})
# Test that the tool returned the expected output to the agent
result.expect.next_event().is_function_call_output(output="sunny with a temperature of 70 degrees.")
# Test that the agent's response is appropriate based on the tool output
await (
result.expect.next_event()
.is_message(role="assistant")
.judge(
llm,
intent="Informs the user that the weather in Tokyo is sunny with a temperature of 70 degrees.",
)
)
# Verify the agent's turn is complete, with no additional messages or function calls
result.expect.no_more_events()
const result = await session
.run({ userInput: "What's the weather in Tokyo?" })
.wait();
// Test that the agent's first conversation item is a function call
result.expect
.nextEvent()
.isFunctionCall({ name: 'getWeather', args: { location: 'Tokyo' } });
// Test that the tool returned the expected output to the agent
result.expect.nextEvent().isFunctionCallOutput();
// Test that the agent's response is appropriate based on the tool output
await result.expect
.nextEvent()
.isMessage({ role: 'assistant' })
.judge(llm, {
intent: 'Informs the user that the weather in Tokyo is sunny with a temperature of 70 degrees.',
});
// Verify the agent's turn is complete, with no additional messages or function calls
result.expect.noMoreEvents();

Access individual properties with the event() method:

  • is_function_call().event().item.name - Function name
  • is_function_call().event().item.arguments - Function arguments
  • is_function_call_output().event().item.output - Raw function output
  • is_function_call_output().event().item.is_error - Whether the output is an error
  • is_function_call_output().event().item.call_id - The function call ID

Agent handoff assertions

Use is_agent_handoff() and contains_agent_handoff() to test that the agent performs a handoff to a new agent.

# The next event must be an agent handoff to the specified agent
result.expect.next_event().is_agent_handoff(new_agent_type=MyAgent)
# A handoff must occur somewhere in the turn
result.expect.contains_agent_handoff(new_agent_type=MyAgent)
// The next event must be an agent handoff to the specified agent
result.expect.nextEvent().isAgentHandoff({ newAgentType: MyAgent });
// A handoff must occur somewhere in the turn
result.expect.containsAgentHandoff({ newAgentType: MyAgent });

Mocking tools

ONLY Available in
Python

In many cases, you should mock your tools for testing. This is useful to easily test edge cases, such as errors or other unexpected behavior, or when the tool has a dependency on an external service that you don't need to test against.

Version requirement

mock_tools requires LiveKit Agents 1.2.6 or later.

Use the mock_tools helper in a with block to mock one or more tools for a specific Agent. To mock a tool that raises an error:

from livekit.agents import mock_tools
# Mock a tool error
with mock_tools(
Assistant,
{"lookup_weather": lambda: RuntimeError("Weather service is unavailable")},
):
result = await session.run(user_input="What's the weather in Tokyo?")
await result.expect.next_event(type="message").judge(
llm, intent="Should inform the user that an error occurred while looking up the weather."
)

For more complex mocks, pass a function instead of a lambda:

def _mock_weather_tool(location: str) -> str:
if location == "Tokyo":
return "sunny with a temperature of 70 degrees."
else:
return "UNSUPPORTED_LOCATION"
# Mock a specific tool response
with mock_tools(Assistant, {"lookup_weather": _mock_weather_tool}):
result = await session.run(user_input="What's the weather in Tokyo?")
await result.expect.next_event(type="message").judge(
llm,
intent="Should indicate the weather in Tokyo is sunny with a temperature of 70 degrees.",
)
result = await session.run(user_input="What's the weather in Paris?")
await result.expect.next_event(type="message").judge(
llm,
intent="Should indicate that weather lookups in Paris are not supported.",
)

Testing multiple turns

You can test multiple turns of a conversation by executing the run method multiple times. The conversation history builds automatically across turns.

# First turn
result1 = await session.run(user_input="Hello")
await result1.expect.next_event().is_message(role="assistant").judge(
llm, intent="Friendly greeting"
)
# Second turn builds on conversation history
result2 = await session.run(user_input="What's the weather like in Tokyo?")
result2.expect.next_event().is_function_call(name="lookup_weather")
result2.expect.next_event().is_function_call_output()
await result2.expect.next_event().is_message(role="assistant").judge(
llm, intent="Provides weather information"
)
// First turn
const result1 = await session.run({ userInput: 'Hello' }).wait();
await result1.expect
.nextEvent()
.isMessage({ role: 'assistant' })
.judge(llm, {
intent: 'Friendly greeting',
});
// Second turn builds on conversation history
const result2 = await session.run({ userInput: "What's the weather like in Tokyo?" }).wait();
result2.expect.nextEvent().isFunctionCall({ name: 'getWeather' });
result2.expect.nextEvent().isFunctionCallOutput();
await result2.expect
.nextEvent()
.isMessage({ role: 'assistant' })
.judge(llm, {
intent: 'Provides weather information',
});

Loading conversation history

To load conversation history manually, use the ChatContext class just as in your agent code:

from livekit.agents import ChatContext
agent = Assistant()
await session.start(agent)
# update_chat_ctx is on the Agent instance, not the session.
# In tests where you don't hold a reference, use session.current_agent.
chat_ctx = ChatContext()
chat_ctx.add_message(role="user", content="My name is Alice")
chat_ctx.add_message(role="assistant", content="Nice to meet you, Alice!")
await agent.update_chat_ctx(chat_ctx)
# Test that the agent remembers the context
result = await session.run(user_input="What's my name?")
await result.expect.next_event().is_message(role="assistant").judge(
llm, intent="Should remember and mention the user's name is Alice"
)
import { llm } from '@livekit/agents';
const { ChatContext } = llm;
const agent = new Assistant();
await session.start({ agent });
// updateChatCtx is on the Agent instance, not the session.
// In tests where you don't hold a reference, use session.currentAgent.
const chatCtx = new ChatContext();
chatCtx.addMessage({ role: 'user', content: 'My name is Alice' });
chatCtx.addMessage({ role: 'assistant', content: 'Nice to meet you, Alice!' });
await agent.updateChatCtx(chatCtx);
// Test that the agent remembers the context
const result = await session.run({ userInput: "What's my name?" }).wait();
await result.expect
.nextEvent()
.isMessage({ role: 'assistant' })
.judge(llm, {
intent: "Should remember and mention the user's name is Alice",
});