Function calling with Voice Agents

Function calling (also known as "tool calling" or "tool use") is a powerful LLM capability that allows AI models to interact with external functions and tools. This allows your agent to retrieve additional context before generating a response or take real-world actions.

For example, when a user says "What's the weather like in New York?", the LLM can intelligently detect the need to call your weather function before it responds. As another example, you could use function calling to trigger an action in the frontend, such as presenting a map to the user or querying their current location.

Both PipelineVoiceAgent and MultimodalAgent have built-in support for function calling, making it easy to create rich voice-powered applications.

Usage

import aiohttp
from typing import Annotated

from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.agents.multimodal import MultimodalAgent

# first define a class that inherits from llm.FunctionContext
class AssistantFnc(llm.FunctionContext):
    # the llm.ai_callable decorator marks this function as a tool available to the LLM
    # by default, it'll use the docstring as the function's description
    @llm.ai_callable()
    async def get_weather(
        self,
        # by using the Annotated type, arg description and type are available to the LLM
        location: Annotated[
            str, llm.TypeInfo(description="The location to get the weather for")
        ],
    ):
        """Called when the user asks about the weather. This function will return the weather for the given location."""
        logger.info(f"getting weather for {location}")
        url = f"https://wttr.in/{location}?format=%C+%t"
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                if response.status == 200:
                    weather_data = await response.text()
                    # response from the function call is returned to the LLM
                    # as a tool response. The LLM's response will include this data
                    return f"The weather in {location} is {weather_data}."
                else:
                    raise f"Failed to get weather data, status code: {response.status}"

fnc_ctx = AssistantFnc()

# pass the function context to the agent
pipeline_agent = VoicePipelineAgent(
    ...
    fnc_ctx=fnc_ctx,
)

multimodal_agent = MultimodalAgent(
    ...
    fnc_ctx=fnc_ctx,
)

Forwarding to the frontend

Function calls can be forwarded to a frontend application using RPC. This is useful when the data needed to fulfill the function call is only available at the frontend. It can also be used to trigger actions or UI updates in a structured way.

For instance, here's a function that accesses the user's live location from their web browser:

Agent implementation

from livekit.agents import llm
from typing import Annotated

class AssistantFnc(llm.FunctionContext):
    @llm.ai_callable()
    async def get_user_location(self,
        high_accuracy: Annotated[
            bool, llm.TypeInfo(description="Whether to use high accuracy mode, which is slower")
        ] = False
    ):
        """Retrieve the user's current geolocation as lat/lng."""
        try:
            return await ctx.room.local_participant.perform_rpc(
                destination_identity=participant.identity,
                method="getUserLocation",
                payload=json.dumps({
                    "highAccuracy": high_accuracy
                }),
                response_timeout=10.0 if high_accuracy else 5.0,
            )
        except Exception:
            return "Unable to retrieve user location"

Frontend implementation

import { RpcError, RpcInvocationData } from 'livekit-client';

localParticipant.registerRpcMethod(
    'getUserLocation',
    async (data: RpcInvocationData) => {
        try {
            let params = JSON.parse(data.payload);
            const position: GeolocationPosition = await new Promise((resolve, reject) => {
                navigator.geolocation.getCurrentPosition(resolve, reject, {
                    enableHighAccuracy: params.highAccuracy ?? false,
                    timeout: data.responseTimeout,
                });
            });

            return JSON.stringify({
                latitude: position.coords.latitude,
                longitude: position.coords.longitude,
            });
        } catch (error) {
            throw new RpcError(1, "Could not retrieve user location");
        }
    }
);