This example builds a minimal STT-only agent that listens to the caller and appends each final transcript to a log file with timestamps. There is no LLM or TTS pipeline—just speech-to-text and a file writer.
Prerequisites
- A
.envat the repo root with your LiveKit credentials:LIVEKIT_URL=your_livekit_urlLIVEKIT_API_KEY=your_api_keyLIVEKIT_API_SECRET=your_api_secret - Install dependencies:pip install python-dotenv "livekit-agents[silero]"
Load configuration and create the AgentServer
Import the necessary modules and load environment variables. Create an AgentServer to handle incoming sessions.
import datetimefrom dotenv import load_dotenvfrom livekit.agents import JobContext, AgentServer, cli, Agent, AgentSession, inferenceload_dotenv()server = AgentServer()
Create an STT-only agent session
Start an AgentSession with only STT configured. The Agent is lightweight with just instructions—no TTS or LLM needed for pure transcription.
session = AgentSession(stt=inference.STT(model="deepgram/nova-3-general"),)await session.start(agent=Agent(instructions="You are a helpful assistant that transcribes user speech to text."),room=ctx.room)
Listen for final transcripts
Subscribe to user_input_transcribed and append each final transcript to user_speech_log.txt with a timestamp.
@session.on("user_input_transcribed")def on_transcript(transcript):if transcript.is_final:timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")with open("user_speech_log.txt", "a") as f:f.write(f"[{timestamp}] {transcript.transcript}\n")
Create the RTC session entrypoint
Wire it all together in the entrypoint so the agent begins listening immediately when the session starts.
@server.rtc_session()async def entrypoint(ctx: JobContext):ctx.log_context_fields = {"room": ctx.room.name}session = AgentSession(stt=inference.STT(model="deepgram/nova-3-general"),)@session.on("user_input_transcribed")def on_transcript(transcript):if transcript.is_final:timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")with open("user_speech_log.txt", "a") as f:f.write(f"[{timestamp}] {transcript.transcript}\n")await session.start(agent=Agent(instructions="You are a helpful assistant that transcribes user speech to text."),room=ctx.room)await ctx.connect()
Run it
python transcriber.py console
The agent starts listening right away and logs transcriptions to user_speech_log.txt.
How it works
- Deepgram STT streams audio and emits
user_input_transcribedevents. - Each final transcript is timestamped and appended to a log file.
- Because there is no LLM/TTS, the agent never speaks; it only records.
- The rest of the session lifecycle is handled by AgentSession.
Log file format
[2024-01-15 14:30:45] Hello, this is my first transcription[2024-01-15 14:30:52] Testing the speech to text functionality
Full example
import datetimefrom dotenv import load_dotenvfrom livekit.agents import JobContext, AgentServer, cli, Agent, AgentSession, inferenceload_dotenv()server = AgentServer()@server.rtc_session()async def entrypoint(ctx: JobContext):ctx.log_context_fields = {"room": ctx.room.name}session = AgentSession(stt=inference.STT(model="deepgram/nova-3-general"),)@session.on("user_input_transcribed")def on_transcript(transcript):if transcript.is_final:timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")with open("user_speech_log.txt", "a") as f:f.write(f"[{timestamp}] {transcript.transcript}\n")await session.start(agent=Agent(instructions="You are a helpful assistant that transcribes user speech to text."),room=ctx.room)await ctx.connect()if __name__ == "__main__":cli.run_app(server)