In this recipe, build an agent that speaks chain-of-thought reasoning aloud while avoiding the vocalization of <think>
and </think>
tokens. The steps focus on cleaning up the text just before it's sent to the TTS engine so the agent sounds natural.
Prerequisites
To complete this guide, you need to create an agent using the Voice agent quickstart.
Modifying LLM output before TTS
The following function removes the <think>
tags from the agent's output so the TTS engine doesn't speak them. Here's how you can define the callback:
async def _before_tts_cb(agent: VoicePipelineAgent, text: str | AsyncIterable[str]):if isinstance(text, str):# Handle non-streaming textresult = text.replace("<think>", "").replace("</think>", "")return resultelse:# Handle streaming textasync def process_stream():async for chunk in text:processed = chunk.replace("<think>", "")\.replace("</think>", "Okay, I'm ready to respond.")yield processedreturn process_stream()
The callback receives two parameters:
agent
: The VoicePipelineAgent instance.text
: Either a string (for non-streaming) or an AsyncIterable of strings (for streaming).
Initializing the VoicePipelineAgent
Use the before_tts_cb
parameter to pass in the callback when creating an instance of VoicePipelineAgent
. The following code also specifies Groq's deepseek-r1-distill-llama-70b
LLM model.
agent = VoicePipelineAgent(vad=ctx.proc.userdata["vad"],stt=openai.STT.(),llm=openai.LLM.with_groq(model="deepseek-r1-distill-llama-70b"), # change the model heretts=openai.TTS(),before_tts_cb=_before_tts_cb, # Add the tts callback herechat_ctx=initial_ctx)
How it works
- The LLM generates text, including tokens like <think>...</think>.
- Before the text goes to TTS, the
before_tts_cb
intercepts it. - The callback strips or modifies unwanted tokens.
- The cleaned text is then spoken by the TTS engine.