External data and RAG

Best practices for adding context and taking external actions.

Overview

Your agent can connect to external data sources to retrieve information, store data, or take other actions. In general, you can install any Python package or add custom code to the agent to use any database or API that you need.

For instance, your agent might need to:

  • Load a user's profile information from a database before starting a conversation.
  • Search a private knowledge base for information to accurately answer user queries.
  • Perform read/write/update operations on a database or service such as a calendar.
  • Store conversation history or other data to a remote server.

This guide covers best practices and techniques for job initialization, retrieval-augmented generation (RAG), tool calls, and other techniques to connect your agent to external data sources and other systems.

Initial context

By default, each AgentSession begins with an empty chat context. You can load user or task-specific data into the agent's context before connecting to the room and starting the session. For instance, this agent greets the user by name based on the job metadata.

from livekit import agents
from livekit.agents import Agent, ChatContext, AgentSession
class Assistant(Agent):
def __init__(self, chat_ctx: ChatContext) -> None:
super().__init__(chat_ctx=chat_ctx, instructions="You are a helpful voice AI assistant.")
async def entrypoint(ctx: agents.JobContext):
# Simple lookup, but you could use a database or API here if needed
metadata = json.loads(ctx.job.metadata)
user_name = metadata["user_name"]
await ctx.connect()
session = AgentSession(
# ... stt, llm, tts, vad, turn_detection, etc.
)
initial_ctx = ChatContext()
initial_ctx.add_message(role="assistant", content=f"The user's name is {user_name}.")
await session.start(
room=ctx.room,
agent=Assistant(chat_ctx=initial_ctx),
# ... room_input_options, etc.
)
await session.generate_reply(
instructions="Greet the user by name and offer your assistance."
)
Load time optimizations

If your agent requires external data in order to start, the following tips can help minimize the impact to the user experience:

  1. For static data (not user-specific) load it in the prewarm function
  2. Send user specific data in the job metadata, room metadata, or participant attributes rather than loading it in the entrypoint.
  3. If you must load make a network call in the entrypoint, do so before ctx.connect(). This ensures your frontend doesn't show the agent participant before it is listening to incoming audio.

Tool calls

To achieve the highest degree of precision or take external actions, you can offer the LLM a choice of tools to use in its response. These tools can be as generic or as specific as needed for your use case.

For instance, define tools for search_calendar, create_event, update_event, and delete_event to give the LLM complete access to the user's calendar. Use participant attributes or job metadata to pass the user's calendar ID and access tokens to the agent.

Tool definition and use

Guide to defining and using custom tools in LiveKit Agents.

Add context during conversation

You can use the on_user_turn_completed node to perform a RAG lookup based on the user's most recent turn, prior to the LLM generating a response. This method can be highly performant as it avoids the extra round-trips involved in tool calls, but it's only available for STT-LLM-TTS pipelines that have access to the user's turn in text form. Additionally, the results are only as good as the accuracy of the search function you implement.

For instance, you can use vector search to retrieve additional context relevant to the user's query and inject it into the chat context for the next LLM generation. Here is a simple example:

from livekit.agents import ChatContext, ChatMessage
async def on_user_turn_completed(
self, turn_ctx: ChatContext, new_message: ChatMessage,
) -> None:
# RAG function definition omitted for brevity
rag_content = await my_rag_lookup(new_message.text_content())
turn_ctx.add_message(
role="assistant",
content=f"Additional information relevant to the user's next message: {rag_content}"
)

User feedback

It’s important to provide users with direct feedback about status updates—for example, to explain a delay or failure. Here are a few example use cases:

  • When an operation takes more than a few hundred milliseconds.
  • When performing write operations such as sending an email or scheduling a meeting.
  • When the agent is unable to perform an operation.

The following section describes various techniques to provide this feedback to the user.

Verbal status updates

Use Agent speech to provide verbal feedback to the user during a long-running tool call or other operation.

In the following example, the agent speaks a status update only if the call takes longer than a specified timeout. The update is dynamically generated based on the query, and could be extended to include an estimate of the remaining time or other information.

import asyncio
from livekit.agents import function_tool, RunContext
@function_tool()
async def search_knowledge_base(
self,
context: RunContext,
query: str,
) -> str:
# Send a verbal status update to the user after a short delay
async def _speak_status_update(delay: float = 0.5):
await asyncio.sleep(delay)
await context.session.generate_reply(instructions=f"""
You are searching the knowledge base for \"{query}\" but it is taking a little while.
Update the user on your progress, but be very brief.
""")
status_update_task = asyncio.create_task(_speak_status_update(0.5))
# Perform search (function definition omitted for brevity)
result = await _perform_search(query)
# Cancel status update if search completed before timeout
status_update_task.cancel()
return result

For more information, see the following article:

Agent speech

Explore the speech capabilities and features of LiveKit Agents.

"Thinking" sounds

Add background audio to play a "thinking" sound automatically when tool calls are ongoing. This can be useful to provide a more natural feel to the agent's responses.

from livekit.agents import BackgroundAudioPlayer, AudioConfig, BuiltinAudioClip
async def entrypoint(ctx: agents.JobContext):
await ctx.connect()
session = AgentSession(
# ... stt, llm, tts, vad, turn_detection, etc.
)
await session.start(
room=ctx.room,
# ... agent, etc.
)
background_audio = BackgroundAudioPlayer(
thinking_sound=[
AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING, volume=0.8),
AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING2, volume=0.7),
],
)
await background_audio.start(room=ctx.room, agent_session=session)

Frontend UI

If your app includes a frontend, you can add custom UI to represent the status of the agent's operations. For instance, present a popup for a long-running operation that the user can optionally cancel:

from livekit.agents import get_job_context
import json
import asyncio
@function_tool()
async def perform_deep_search(
self,
context: RunContext,
summary: str,
query: str,
) -> str:
"""
Initiate a deep internet search that will reference many external sources to answer the given query. This may take 1-5 minutes to complete.
Summary: A user-friendly summary of the query
Query: the full query to be answered
"""
async def _notify_frontend(query: str):
room = get_job_context().room
response = await room.local_participant.perform_rpc(
destination_identity=next(iter(room.remote_participants)),
# frontend method that shows a cancellable popup
# (method definition omitted for brevity, see RPC docs)
method='start_deep_search',
payload=json.dumps({
"summary": summary,
"estimated_completion_time": 300,
}),
# Allow the frontend a long time to return a response
response_timeout=500,
)
# In this example the frontend has a Cancel button that returns "cancelled"
# to stop the task
if response == "cancelled":
deep_search_task.cancel()
notify_frontend_task = asyncio.create_task(_notify_frontend(query))
# Perform deep search (function definition omitted for brevity)
deep_search_task = asyncio.create_task(_perform_deep_search(query))
try:
result = await deep_search_task
except asyncio.CancelledError:
result = "Search cancelled by user"
finally:
notify_frontend_task.cancel()
return result

For more information and examples, see the following articles:

Fine-tuned models

Sometimes the best way to get the most relevant results is to fine-tune a model for your specific use case. You can explore the available LLM integrations to find a provider that supports fine-tuning, or use Ollama to integrate a custom model.

RAG providers and services

You can integrate with any RAG provider or tool of your choice to enhance your agent with additional context. Suggested providers and tools include:

  • LlamaIndex - Framework for connecting custom data to LLMs.
  • Mem0 - Memory layer for AI assistants.
  • TurboPuffer - Fast serverless vector search built on object storage.
  • Pinecone - Managed vector database for AI applications.
  • Annoy - Open source Python library from Spotify for nearest neighbor search.

Additional examples

The following examples show how to implement RAG and other techniques: