External data and RAG | LiveKit Documentation

Overview

Your agent can connect to external data sources to retrieve information, store data, or take other actions. In general, you can install any Python package or add custom code to the agent to use any database or API that you need.

For instance, your agent might need to:

Load a user's profile information from a database before starting a conversation.
Search a private knowledge base for information to accurately answer user queries.
Perform read/write/update operations on a database or service such as a calendar.
Store conversation history or other data to a remote server.

This guide covers best practices and techniques for job initialization, retrieval-augmented generation (RAG), tool calls, and other techniques to connect your agent to external data sources and other systems.

Initial context

By default, each AgentSession begins with an empty chat context. You can load user or task-specific data into the agent's context before connecting to the room and starting the session. For instance, this agent greets the user by name based on the job metadata.

from livekit import agents
from livekit.agents import AgentServer, Agent, ChatContext, AgentSession

class Assistant(Agent):
    def __init__(self, chat_ctx: ChatContext) -> None:
        super().__init__(chat_ctx=chat_ctx, instructions="You are a helpful voice AI assistant.")

server = AgentServer()

@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    # Simple lookup, but you could use a database or API here if needed
    metadata = json.loads(ctx.job.metadata)
    user_name = metadata["user_name"]

    session = AgentSession(
        # ... stt, llm, tts, vad, turn_detection, etc.
    )
    
    initial_ctx = ChatContext()
    initial_ctx.add_message(role="assistant", content=f"The user's name is {user_name}.")

    await session.start(
        room=ctx.room,
        agent=Assistant(chat_ctx=initial_ctx),
        # ... room_options, etc.
    )

    await session.generate_reply(
        instructions="Greet the user by name and offer your assistance."
    )

import { voice, llm, defineAgent, type JobContext } from '@livekit/agents';

class Assistant extends voice.Agent {
  constructor(chatCtx: llm.ChatContext) {
    super({
      chatCtx,
      instructions: 'You are a helpful voice AI assistant.',
    });
  }
}

export default defineAgent({
  entry: async (ctx: JobContext) => {
    // Simple lookup, but you could use a database or API here if needed
    const metadata = JSON.parse(ctx.job.metadata);
    const userName = metadata.user_name;

    const session = new voice.AgentSession({
      // ... stt, llm, tts, vad, turnDetection, etc.
    });
    
    const initialCtx = llm.ChatContext.empty();
    initialCtx.addMessage({
      role: 'assistant',
      content: `The user's name is ${userName}.`,
    });

    await session.start({
      room: ctx.room,
      agent: new Assistant(initialCtx),
      // ... inputOptions, outputOptions, etc.
    });

    await session.generateReply({
      instructions: 'Greet the user by name and offer your assistance.',
    });
  },
});

Load time optimizations

If your agent requires external data in order to start, the following tips can help minimize the impact to the user experience:

For static data (not user-specific) load it in the prewarm function
Send user specific data in the job metadata, room metadata, or participant attributes rather than loading it in the entrypoint.
If you must make a network call in the entrypoint, do so before ctx.connect(). This ensures your frontend doesn't show the agent participant before it is listening to incoming audio.

Tool calls

To achieve the highest degree of precision or take external actions, you can offer the LLM a choice of tools to use in its response. These tools can be as generic or as specific as needed for your use case.

For instance, define tools for search_calendar, create_event, update_event, and delete_event to give the LLM complete access to the user's calendar. Use participant attributes or job metadata to pass the user's calendar ID and access tokens to the agent.

Tool definition and use

Guide to defining and using custom tools in LiveKit Agents.

Add context during conversation

You can use the on_user_turn_completed node to perform a RAG lookup based on the user's most recent turn, prior to the LLM generating a response. This method can be highly performant as it avoids the extra round-trips involved in tool calls, but it's only available for STT-LLM-TTS pipelines that have access to the user's turn in text form. Additionally, the results are only as good as the accuracy of the search function you implement.

For instance, you can use vector search to retrieve additional context relevant to the user's query and inject it into the chat context for the next LLM generation. Here is a simple example:

from livekit.agents import ChatContext, ChatMessage

async def on_user_turn_completed(
    self, turn_ctx: ChatContext, new_message: ChatMessage,
) -> None:
    # RAG function definition omitted for brevity
    rag_content = await my_rag_lookup(new_message.text_content)
    turn_ctx.add_message(
        role="assistant", 
        content=f"Additional information relevant to the user's next message: {rag_content}"
    )

import { voice, llm } from '@livekit/agents';

class RagAgent extends voice.Agent {
  async onUserTurnCompleted(
    chatCtx: llm.ChatContext, 
    newMessage: llm.ChatMessage,
  ): Promise<void> {
    // RAG function definition omitted for brevity
    const ragContent = await myRagLookup(newMessage.textContent);
    chatCtx.addMessage({
      role: 'assistant',
      content: `Additional information relevant to the user's next message: ${ragContent}`,
    });
  }
}

User feedback

It's important to provide users with direct feedback about status updates, for example, to explain a delay or failure. Here are a few example use cases:

When an operation takes more than a few hundred milliseconds.
When performing write operations such as sending an email or scheduling a meeting.
When the agent is unable to perform an operation.

The following section describes various techniques to provide this feedback to the user.

Async tools

For tools that run in the background with built-in progress updates, see async tools. AsyncToolset handles progress delivery and reply timing automatically, without the manual generate_reply() pattern shown below.

Verbal status updates

Use Agent speech to provide verbal feedback to the user during a long-running tool call or other operation.

In the following example, the agent speaks a status update only if the call takes longer than a specified timeout. The update is dynamically generated based on the query, and could be extended to include an estimate of the remaining time or other information.

import asyncio
from livekit.agents import function_tool, RunContext

@function_tool()
async def search_knowledge_base(
    self,
    context: RunContext,
    query: str,
) -> str:
    # Send a verbal status update to the user after a short delay
    async def _speak_status_update(delay: float = 0.5):
        await asyncio.sleep(delay)
        await context.session.generate_reply(instructions=f"""
            You are searching the knowledge base for \"{query}\" but it is taking a little while.
            Update the user on your progress, but be very brief.
        """)
    
    status_update_task = asyncio.create_task(_speak_status_update(0.5))

    # Perform search (function definition omitted for brevity)
    result = await _perform_search(query)
    
    # Cancel status update if search completed before timeout
    status_update_task.cancel()
    
    return result

import { llm, Task } from '@livekit/agents';
import { z } from 'zod';

const searchKnowledgeBase = llm.tool({
  description: 'Search the knowledge base for information',
  parameters: z.object({
    query: z.string(),
  }),
  execute: async ({ query }, { ctx, abortSignal }) => {
    // Send a verbal status update to the user after a short delay
    const speakStatusUpdate = async (controller: AbortController) => {
      await new Promise(resolve => setTimeout(resolve, 500));
      if (!controller.signal.aborted) {
        ctx.session.generateReply({
          instructions: `You are searching the knowledge base for "${query}" but it is taking a little while. Update the user on your progress, but be very brief.`,
        });
      }
    };

    const statusUpdateTask = Task.from(speakStatusUpdate);

		// Perform search (function definition omitted for brevity)
		const result = await performSearch(query);
		
		// Cancel status update if search completed before timeout
		statusUpdateTask.cancel()
		
		return result;
  },
});

For more information, see the following article:

Agent speech

Explore the speech capabilities and features of LiveKit Agents.

For fixed phrases like "let me check that for you," you can avoid TTS latency entirely by pre-synthesizing the audio once and replaying it from cache. See Using cached TTS in a tool call for a complete example that also shows how to cancel the hold message early if the API returns quickly.

"Thinking" sounds

Add background audio to play a "thinking" sound automatically when tool calls are ongoing. This can be useful to provide a more natural feel to the agent's responses.

from livekit.agents import AgentServer, BackgroundAudioPlayer, AudioConfig, BuiltinAudioClip

server = AgentServer()

@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        # ... stt, llm, tts, vad, turn_detection, etc.
    )

    await session.start(
        room=ctx.room,
        # ... agent, etc.
    )

    background_audio = BackgroundAudioPlayer(
        thinking_sound=[
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING, volume=0.8),
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING2, volume=0.7),
        ],
    )
    await background_audio.start(room=ctx.room, agent_session=session)

import { type JobContext, defineAgent, log, voice } from '@livekit/agents';


export default defineAgent({
  entry: async (ctx: JobContext) => {
    const logger = log();

    await ctx.connect();
    logger.info('Connected to room');

    const agent = new voice.Agent({
      instructions: 'You are a helpful assistant',
      // ... tools, etc.
    });

    const session = new voice.AgentSession({
      // ... stt, llm, tts, vad, turn_detection, etc.
    });
    await session.start({ agent, room: ctx.room });

    const backgroundAudio = new voice.BackgroundAudioPlayer({
      thinkingSound: [
        { source: voice.BuiltinAudioClip.KEYBOARD_TYPING, volume: 0.8, probability: 0.6 },
        { source: voice.BuiltinAudioClip.KEYBOARD_TYPING2, volume: 0.7, probability: 0.4 },
      ],
    });

    await backgroundAudio.start({ room: ctx.room, agentSession: session });

    // Play another audio file at any time using the play method:
    // backgroundAudio.play('filepath.ogg');
  },
});

For a complete example, see the following:

Background audio

Guide to using background audio in your agent in Python.

Background audio

Guide to using background audio in your agent in Node.js.

Frontend UI

If your app includes a frontend, you can add custom UI to represent the status of the agent's operations. For instance, present a popup for a long-running operation that the user can optionally cancel:

from livekit.agents import get_job_context
import json
import asyncio

@function_tool()
async def perform_deep_search(
    self,
    context: RunContext,
    summary: str,
    query: str,
) -> str:
    """
    Initiate a deep internet search that will reference many external sources to answer the given query. This may take 1-5 minutes to complete.

    Summary: A user-friendly summary of the query
    Query: the full query to be answered
    """
    async def _notify_frontend(query: str):
        room = get_job_context().room
        response = await room.local_participant.perform_rpc(
            destination_identity=next(iter(room.remote_participants)),
            # frontend method that shows a cancellable popup
            # (method definition omitted for brevity, see RPC docs)
            method='start_deep_search',
            payload=json.dumps({
                "summary": summary,
                "estimated_completion_time": 300,
            }),
            # Allow the frontend a long time to return a response
            response_timeout=500,
        )
        # In this example the frontend has a Cancel button that returns "cancelled"
        # to stop the task
        if response == "cancelled":
            deep_search_task.cancel()

    notify_frontend_task = asyncio.create_task(_notify_frontend(query))

    # Perform deep search (function definition omitted for brevity)
    deep_search_task = asyncio.create_task(_perform_deep_search(query))

    try:
        result = await deep_search_task
    except asyncio.CancelledError:
        result = "Search cancelled by user"
    finally:
        notify_frontend_task.cancel()
        return result

import { llm, Task, getJobContext } from '@livekit/agents';
import { z } from 'zod';

const performDeepSearch = llm.tool({
  description: 'Initiate a deep internet search that will reference many external sources to answer the given query. This may take 1-5 minutes to complete.',
  parameters: z.object({
    summary: z.string(),
    query: z.string(),
  }),
  execute: async ({ summary, query }, { ctx }) => {
    // Notify frontend with cancellable popup
    const notifyFrontend = async (controller: AbortController) => {
      const room = getJobContext().room;
      const participant = Array.from(room.remoteParticipants.values())[0]!;
      
      const response = await room.localParticipant!.performRpc({
        destinationIdentity: participant.identity,
        // frontend method that shows a cancellable popup
        // (method definition omitted for brevity, see RPC docs)
        method: 'start_deep_search',
        payload: JSON.stringify({
          summary,
          estimated_completion_time: 300,
        }),
        // Allow the frontend a long time to return a response
        responseTimeout: 500000,
      });
      
      // In this example the frontend has a Cancel button that returns "cancelled"
      // to stop the task
      if (response === "cancelled") {
        deepResearchTask.cancel();
      }
    };

    const notifyTask = Task.from(notifyFrontend);

    // Perform deep search (function definition omitted for brevity)
    const deepResearchTask = Task.from((controller) => performDeepSearch(query, controller));
      
    let result = "";
    try {
			result = await deepResearchTask.result;
    } catch (error) {
      result = "Search cancelled by user";
    } finally {
	    notifyTask.cancel();
	    return result;
    }
  },
});

For more information and examples, see the following articles:

Web and mobile frontends

Guide to building a custom web or mobile frontend for your agent.

RPC

Learn how to use RPC to communicate with your agent from the frontend.

Fine-tuned models

Sometimes the best way to get the most relevant results is to fine-tune a model for your specific use case. You can explore the available LLM plugins to find a provider that supports fine-tuning, or use Ollama to integrate a custom model.

External services

Many providers offer services to provide memory or other capabilities to your agents. Some suggested services that work well with LiveKit Agents include:

Letta plugin

Build and deploy stateful AI agents that maintain memory and context across long-running conversations.

AgentMail

Give your agents their own email inboxes.

LlamaIndex

Framework for connecting custom data to LLMs.

Mem0

Self-improving memory layer for AI agents.

Additional examples

The following examples show how to implement RAG and other techniques:

LlamaIndex RAG

A voice AI agent that uses LlamaIndex for RAG to answer questions from a knowledge base.