This example shows how to build a simple repeater: when the user finishes speaking, the agent says back exactly what it heard by listening to the user_input_transcribed event.
Prerequisites
- Add a
.envin this directory with your LiveKit credentials:LIVEKIT_URL=your_livekit_urlLIVEKIT_API_KEY=your_api_keyLIVEKIT_API_SECRET=your_api_secret - Install dependencies:pip install "livekit-agents[silero]" python-dotenv
Load environment and define an AgentServer
Load your .env so the media plugins can authenticate and initialize the AgentServer.
from dotenv import load_dotenvfrom livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, inferencefrom livekit.plugins import sileroload_dotenv()server = AgentServer()
Prewarm VAD for faster connections
Preload the VAD model once per process to reduce connection latency.
def prewarm(proc: JobProcess):proc.userdata["vad"] = silero.VAD.load()server.setup_fnc = prewarm
Define the rtc session with transcript handler
Create the session with interruptions disabled so playback is not cut off mid-echo. Attach a handler to user_input_transcribed; once a transcript is marked final, echo it back with session.say.
@server.rtc_session()async def entrypoint(ctx: JobContext):ctx.log_context_fields = {"room": ctx.room.name}session = AgentSession(stt=inference.STT(model="deepgram/nova-3-general"),llm=inference.LLM(model="openai/gpt-5-mini"),tts=inference.TTS(model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),vad=ctx.proc.userdata["vad"],allow_interruptions=False,)@session.on("user_input_transcribed")def on_transcript(transcript):if transcript.is_final:session.say(transcript.transcript)await session.start(agent=Agent(instructions="You are a helpful assistant that repeats what the user says."),room=ctx.room)await ctx.connect()
Run the server
Start the agent server with the CLI runner.
if __name__ == "__main__":cli.run_app(server)
Run it
python repeater.py console
How it works
- The VAD is prewarmed once per process for faster connections.
- A session-level event emits transcripts as the user speaks.
- When the transcript is final, the handler calls
session.saywith the same text. - Because interruptions are disabled, the echoed audio plays fully.
- This pattern is a starting point for building more advanced post-processing on transcripts.
Full example
from dotenv import load_dotenvfrom livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, inferencefrom livekit.plugins import sileroload_dotenv()server = AgentServer()def prewarm(proc: JobProcess):proc.userdata["vad"] = silero.VAD.load()server.setup_fnc = prewarm@server.rtc_session()async def entrypoint(ctx: JobContext):ctx.log_context_fields = {"room": ctx.room.name}session = AgentSession(stt=inference.STT(model="deepgram/nova-3-general"),llm=inference.LLM(model="openai/gpt-5-mini"),tts=inference.TTS(model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),vad=ctx.proc.userdata["vad"],allow_interruptions=False,)@session.on("user_input_transcribed")def on_transcript(transcript):if transcript.is_final:session.say(transcript.transcript)await session.start(agent=Agent(instructions="You are a helpful assistant that repeats what the user says."),room=ctx.room)await ctx.connect()if __name__ == "__main__":cli.run_app(server)