Voice pipeline nodes

Learn how to customize the behavior of your agent by overriding nodes in the voice pipeline.

Overview

The Agents framework allows you to fully customize your agent’s behavior at multiple nodes in the processing path. A node is a point in the path where one process transitions to another. In the case of STT, LLM, and TTS nodes, in addition to customizing the pre- and post-processing at the transition point from one node to the next, you can also entirely replace the default process with custom code.

These nodes are exposed on the Agent class and occur at the following points in the pipeline:

  • on_enter(): Agent enters session.
  • on_exit(): Agent exists session.
  • on_end_of_turn(): User finishes speaking (pipeline only).
  • transcription_node(): Agent transcription is available.
  • stt_node(): Agent's STT processing step (pipeline only).
  • llm_node(): Agent's LLM processing step (pipeline only).
  • tts_node(): Agent's TTS processing step (pipeline only).
Note

If you're using a realtime model, the only nodes available are on_enter, on_exit, and transcription_node.

Pipeline and realtime agent differences

Realtime agents aren't componentized like pipeline agents and don't have nodes for STT, LLM, and TTS. Instead, realtime agents use a single model for the entire agent, and the agent processes user input in realtime. You can still customize the behavior of a realtime agent by overriding the transcription node, updating the agent's instructions, or adding to its chat context.

Agent with a voice pipeline

Processing path for a voice pipeline agent:

Loading diagram…

Agent with a realtime model

Processing path for a realtime agent:

Loading diagram…

Use cases for customization

The following use cases are some examples of how you can customize your agent's behavior:

  • Use a custom STT, LLM, or TTS provider without a plugin.
  • Generate a custom greeting when an agent enters a session.
  • Modify STT output to remove filler words before sending it to the LLM.
  • Modify LLM output before sending it to TTS to customize pronunciation.
  • Update the user interface when an agent or user finishes speaking.

Customizing node behavior

Each node is a step in the agent pipeline where processing takes place. By default, some nodes are stub methods, and other nodes (the STT, LLM, and TTS nodes) execute the code in the provider plugin. For these nodes, you can customize behavior by overriding the node and adding additional processing before, after, or instead of the default behavior.

Stub methods are provided to allow you to add functionality at specific points in the processing path.

On enter and exit nodes

The on_enter and on_exit nodes are called when the agent enters or leaves an agent session. When an agent enters a session, it becomes that agent in control and handles processing for the session until the agent exits. To learn more, see Workflows.

For example, initiate a conversation when an agent enters the session:

async def on_enter(self):
# Instruct the agent to greet the user when it's added to a session
self.session.generate_reply(instructions="Greet the user with a warm welcome")

For a more comprehensive example of a handoff between agents, and saving chat history in the on_enter node, see the restaurant ordering and reservations example.

You can override the on_exit method to say goodbye before the agent exits the session:

async def on_exit(self):
# Say goodbye
self.session.generate_reply(instructions="Tell the user a friendly goodbye before you exit.")

On end of turn node

The on_end_of_turn node is called when the user finishes speaking. You can customize this node by overriding the on_end_of_turn method in your Agent. For example, manually update the chat context for a push-to-talk interface:

async def on_end_of_turn(
self, chat_ctx: ChatContext, new_message: ChatMessage, generating_reply: bool
) -> None:
# callback when user input is transcribed
chat_ctx = chat_ctx.copy()
chat_ctx.items.append(new_message)
await self.update_chat_ctx(chat_ctx)
logger.info("add user message to chat context", extra={"content": new_message.content})

For a complete example, see the multi-user agent with push to talk example.

STT node

For example, you can add noise filtering to the STT node by overriding the stt_node method in your Agent:

async def stt_node(
self, audio: AsyncIterable[rtc.AudioFrame]
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
async def filtered_audio():
async for frame in audio:
# Apply some noise filtering logic here
yield frame
async for event in super().stt_node(filtered_audio()):
yield event

LLM node

The LLM node is responsible for generating the agent's response. You can customize the LLM node by overriding the llm_node method in your Agent. For example, you can update the LLM output before sending it to the TTS node as in the following example:

async def llm_node(
self,
chat_ctx: llm.ChatContext,
tools: list[FunctionTool],
model_settings: ModelSettings
) -> AsyncIterable[llm.ChatChunk]:
# Process with base LLM implementation
async for chunk in super().llm_node(chat_ctx, tools, model_settings):
# Do something with the LLM output before sending it to the next node
yield chunk

TTS node

The TTS node is responsible for converting the LLM output into audio. You can customize the TTS node by overriding the tts_node method in your Agent. For example, you can update the TTS output before sending it to the user interface as in the following example:

async def tts_node(
self,
text: AsyncIterable[str],
model_settings: ModelSettings
) -> AsyncIterable[rtc.AudioFrame]:
"""
Process text-to-speech with custom pronunciation rules before synthesis.
Adjusts common technical terms and abbreviations for better pronunciation.
"""
# Dictionary of pronunciation replacements.
# Support for custom pronunciations depends on the TTS provider.
# To learn more, see the Speech documentation:
# https://docs.livekit.io/agents/build/speech/#customizing-pronunciation.
pronunciations = {
"API": "A P I",
"REST": "rest",
"SQL": "sequel",
"kubectl": "kube control",
"AWS": "A W S",
"UI": "U I",
"URL": "U R L",
"npm": "N P M",
"LiveKit": "Live Kit",
"async": "a sink",
"nginx": "engine x",
}
async def adjust_pronunciation(input_text: AsyncIterable[str]) -> AsyncIterable[str]:
async for chunk in input_text:
modified_chunk = chunk
# Apply pronunciation rules
for term, pronunciation in pronunciations.items():
# Use word boundaries to avoid partial replacements
modified_chunk = re.sub(
rf'\b{term}\b',
pronunciation,
modified_chunk,
flags=re.IGNORECASE
)
yield modified_chunk
# Process with modified text through base TTS implementation
async for frame in super().tts_node(
adjust_pronunciation(text),
model_settings
):
yield frame

Transcription node

The transcription node is part of the forwarding path for agent transcriptions. By default, the node simply passes the transcription to the task that forwards it to the designated output. You can customize this behavior by overriding the transcription_node method in your Agent. For example, you can store the transcription in a database as in the following example:

async def transcription_node(self, text: AsyncIterable[str]) -> AsyncIterable[str]:
"""Process the LLM output to transcriptions and store in database"""
async def store_in_db(text_chunk: str):
# Method to store the transcription in a database
pass
async for delta in text:
# Store each chunk of text as it comes in
await store_in_db(delta)
# Forward the text chunk to the next node in the pipeline
yield delta