Overview
The Agents framework allows you to fully customize your agent’s behavior at multiple nodes in the processing path. A node is a point in the path where one process transitions to another. In the case of STT, LLM, and TTS nodes, in addition to customizing the pre- and post-processing at the transition point from one node to the next, you can also entirely replace the default process with custom code.
These nodes are exposed on the Agent
class and occur at the following points in the pipeline:
on_enter()
: Agent enters session.on_exit()
: Agent exists session.on_end_of_turn()
: User finishes speaking (pipeline only).transcription_node()
: Agent transcription is available.stt_node()
: Agent's STT processing step (pipeline only).llm_node()
: Agent's LLM processing step (pipeline only).tts_node()
: Agent's TTS processing step (pipeline only).
If you're using a realtime model, the only nodes available are on_enter
, on_exit
, and transcription_node
.
Pipeline and realtime agent differences
Realtime agents aren't componentized like pipeline agents and don't have nodes for STT, LLM, and TTS. Instead, realtime agents use a single model for the entire agent, and the agent processes user input in realtime. You can still customize the behavior of a realtime agent by overriding the transcription node, updating the agent's instructions, or adding to its chat context.
Agent with a voice pipeline
Processing path for a voice pipeline agent:
Loading diagram…
Agent with a realtime model
Processing path for a realtime agent:
Loading diagram…
Use cases for customization
The following use cases are some examples of how you can customize your agent's behavior:
- Use a custom STT, LLM, or TTS provider without a plugin.
- Generate a custom greeting when an agent enters a session.
- Modify STT output to remove filler words before sending it to the LLM.
- Modify LLM output before sending it to TTS to customize pronunciation.
- Update the user interface when an agent or user finishes speaking.
Customizing node behavior
Each node is a step in the agent pipeline where processing takes place. By default, some nodes are stub methods, and other nodes (the STT, LLM, and TTS nodes) execute the code in the provider plugin. For these nodes, you can customize behavior by overriding the node and adding additional processing before, after, or instead of the default behavior.
Stub methods are provided to allow you to add functionality at specific points in the processing path.
On enter and exit nodes
The on_enter
and on_exit
nodes are called when the agent enters or leaves an agent session. When an agent enters a session, it becomes that agent in control and handles processing for the session until the agent exits. To learn more, see Workflows.
For example, initiate a conversation when an agent enters the session:
async def on_enter(self):# Instruct the agent to greet the user when it's added to a sessionself.session.generate_reply(instructions="Greet the user with a warm welcome")
For a more comprehensive example of a handoff between agents, and saving chat history in the on_enter
node, see the restaurant ordering and reservations example.
You can override the on_exit
method to say goodbye before the agent exits the session:
async def on_exit(self):# Say goodbyeself.session.generate_reply(instructions="Tell the user a friendly goodbye before you exit.")
On end of turn node
The on_end_of_turn
node is called when the user finishes speaking. You can customize this node by overriding the on_end_of_turn
method in your Agent
. For example, manually update the chat context for a push-to-talk interface:
async def on_end_of_turn(self, chat_ctx: ChatContext, new_message: ChatMessage, generating_reply: bool) -> None:# callback when user input is transcribedchat_ctx = chat_ctx.copy()chat_ctx.items.append(new_message)await self.update_chat_ctx(chat_ctx)logger.info("add user message to chat context", extra={"content": new_message.content})
For a complete example, see the multi-user agent with push to talk example.
STT node
For example, you can add noise filtering to the STT node by overriding the stt_node
method in your Agent
:
async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame]) -> Optional[AsyncIterable[stt.SpeechEvent]]:async def filtered_audio():async for frame in audio:# Apply some noise filtering logic hereyield frameasync for event in super().stt_node(filtered_audio()):yield event
LLM node
The LLM node is responsible for generating the agent's response. You can customize the LLM node by overriding the llm_node
method in your Agent
. For example, you can update the LLM output before sending it to the TTS node as in the following example:
async def llm_node(self,chat_ctx: llm.ChatContext,tools: list[FunctionTool],model_settings: ModelSettings) -> AsyncIterable[llm.ChatChunk]:# Process with base LLM implementationasync for chunk in super().llm_node(chat_ctx, tools, model_settings):# Do something with the LLM output before sending it to the next nodeyield chunk
TTS node
The TTS node is responsible for converting the LLM output into audio. You can customize the TTS node by overriding the tts_node
method in your Agent
. For example, you can update the TTS output before sending it to the user interface as in the following example:
async def tts_node(self,text: AsyncIterable[str],model_settings: ModelSettings) -> AsyncIterable[rtc.AudioFrame]:"""Process text-to-speech with custom pronunciation rules before synthesis.Adjusts common technical terms and abbreviations for better pronunciation."""# Dictionary of pronunciation replacements.# Support for custom pronunciations depends on the TTS provider.# To learn more, see the Speech documentation:# https://docs.livekit.io/agents/build/speech/#customizing-pronunciation.pronunciations = {"API": "A P I","REST": "rest","SQL": "sequel","kubectl": "kube control","AWS": "A W S","UI": "U I","URL": "U R L","npm": "N P M","LiveKit": "Live Kit","async": "a sink","nginx": "engine x",}async def adjust_pronunciation(input_text: AsyncIterable[str]) -> AsyncIterable[str]:async for chunk in input_text:modified_chunk = chunk# Apply pronunciation rulesfor term, pronunciation in pronunciations.items():# Use word boundaries to avoid partial replacementsmodified_chunk = re.sub(rf'\b{term}\b',pronunciation,modified_chunk,flags=re.IGNORECASE)yield modified_chunk# Process with modified text through base TTS implementationasync for frame in super().tts_node(adjust_pronunciation(text),model_settings):yield frame
Transcription node
The transcription node is part of the forwarding path for agent transcriptions. By default, the node simply passes the transcription to the task that forwards it to the designated output. You can customize this behavior by overriding the transcription_node
method in your Agent
. For example, you can store the transcription in a database as in the following example:
async def transcription_node(self, text: AsyncIterable[str]) -> AsyncIterable[str]:"""Process the LLM output to transcriptions and store in database"""async def store_in_db(text_chunk: str):# Method to store the transcription in a databasepassasync for delta in text:# Store each chunk of text as it comes inawait store_in_db(delta)# Forward the text chunk to the next node in the pipelineyield delta