Function calling with Voice Agents

On this page

LLM function calling allows a language model to trigger predefined functions based on user input. The model parses the input, identifies when a function is needed, and calls it with the appropriate parameters, enabling dynamic and programmable interactions.

PipelineVoiceAgent and MultimodalAgent have built-in support for LLM function calling. This makes it possible for your functions to be called directly from voice.

Usage

import aiohttp
from typing import Annotated
from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.agents.multimodal import MultimodalAgent
# first define a class that inherits from llm.FunctionContext
class AssistantFnc(llm.FunctionContext):
# the llm.ai_callable decorator marks this function as a tool available to the LLM
# by default, it'll use the docstring as the function's description
@llm.ai_callable()
async def get_weather(
self,
# by using the Annotated type, arg description and type are available to the LLM
location: Annotated[
str, llm.TypeInfo(description="The location to get the weather for")
],
):
"""Called when the user asks about the weather. This function will return the weather for the given location."""
logger.info(f"getting weather for {location}")
url = f"https://wttr.in/{location}?format=%C+%t"
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
if response.status == 200:
weather_data = await response.text()
# response from the function call is returned to the LLM
# as a tool response. The LLM's response will include this data
return f"The weather in {location} is {weather_data}."
else:
raise f"Failed to get weather data, status code: {response.status}"
fnc_ctx = AssistantFnc()
# pass the function context to the agent
pipeline_agent = VoicePipelineAgent(
...
fnc_ctx=fnc_ctx,
)
multimodal_agent = MultimodalAgent(
...
fnc_ctx=fnc_ctx,
)