In this recipe, build an agent that speaks chain-of-thought reasoning aloud while avoiding the vocalization of <think>
and </think>
tokens. The steps focus on cleaning up the text just before it's sent to the TTS engine so the agent sounds natural.
Prerequisites
To complete this guide, you need to create an agent using the Voice agent quickstart.
Modifying LLM output before TTS
You can modify the LLM output by creating a custom Agent class and overriding the llm_node
method. Here's how to implement an agent that removes <think>
tags from the output:
import loggingfrom pathlib import Pathfrom dotenv import load_dotenvfrom livekit.agents import JobContext, WorkerOptions, clifrom livekit.agents.voice import Agent, AgentSessionfrom livekit.plugins import openai, deepgram, sileroload_dotenv()logger = logging.getLogger("replacing-llm-output")logger.setLevel(logging.INFO)class ChainOfThoughtAgent(Agent):def __init__(self) -> None:super().__init__(instructions="""You are a helpful agent that thinks through problems step by step.When reasoning through a complex question, wrap your thinking in <think></think> tags.After you've thought through the problem, provide your final answer.""",stt=deepgram.STT(),llm=openai.LLM.with_groq(model="deepseek-r1-distill-llama-70b"),tts=openai.TTS(),vad=silero.VAD.load())async def on_enter(self):self.session.generate_reply()async def llm_node(self, chat_ctx, tools, model_settings=None):activity = self._Agent__get_activity_or_raise()assert activity.llm is not None, "llm_node called but no LLM node is available"async def process_stream():async with activity.llm.chat(chat_ctx=chat_ctx, tools=tools, tool_choice=None) as stream:async for chunk in stream:if chunk is None:continuecontent = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else str(chunk)if content is None:yield chunkcontinueprocessed_content = content.replace("<think>", "").replace("</think>", "Okay, I'm ready to respond.")if processed_content != content:if hasattr(chunk, 'delta') and hasattr(chunk.delta, 'content'):chunk.delta.content = processed_contentelse:chunk = processed_contentyield chunk
Setting up the agent session
Create an entrypoint function to initialize and run the agent:
async def entrypoint(ctx: JobContext):await ctx.connect()session = AgentSession()await session.start(agent=ChainOfThoughtAgent(),room=ctx.room)if __name__ == "__main__":cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
How it works
- The LLM generates text with chain-of-thought reasoning wrapped in
<think>...</think>
tags - The overridden
llm_node
method intercepts the LLM output stream - For each chunk of text:
- The method checks if there's content to process
- It replaces
<think>
tags with empty string and</think>
tags with "Okay, I'm ready to respond." - The modified content is then passed on to the TTS engine
- The TTS engine only speaks the processed text
This approach gives you fine-grained control over how the agent processes and speaks LLM responses, allowing for more sophisticated conversational behaviors.