There are multiple reasons you might want to record AI agent conversations with users. These reasons might include monitoring agent performance, evaluating customer satisfaction, or for regulatory compliance. LiveKit allows you to record video and audio of AI agent conversations or save conversations as text transcripts.
Video or audio recording
LiveKit's Egress feature provides flexible options for recording audio and/or video. Start a room composite recorder in your agent's entrypoint to record a session. Room recording starts when the agent enters a room with a user and captures all the participants and interactions in a LiveKit room. Recording ends when all particpants leave and the room is closed.
LiveKit egress requires access to a cloud storage provider to upload the recording files. The following example uses Google Cloud Storage, but you can also save files to any Amazon S3-compatible storage provider or Azure Blob Storage.
Example
This example uses the VoicePipelineAgent template as a starting point.
Clone the repo or run the following LiveKit CLI command:
lk app create --template=voice-pipeline-agent-python my-recording-appAfter you run the command, follow the instructions in the command output to finish setup.
Update the
agent.py
file to importlivekit.api
:from livekit import apiUpdate the
entrypoint
function to add room recording.This example uses Google Cloud Storage. The
credentials.json
file includes authentication credentials for accessing the bucket. For additional egress examples using Amazon S3 and Azure, see the Egress examples. To learn more aboutcredentials.json
, see Cloud storage configurations.Replace
<my-bucket>
and update the entrypoint function with the following:NoteTo record only audio, update the
audio_only
parameter toTrue
and remove thepreset
parameter.async def entrypoint(ctx: JobContext):# Get GCP credentials from credentials.json file.file_contents = ""with open("/path/to/credentials.json", "r") as f:file_contents = f.read()# Set up recordingreq = api.RoomCompositeEgressRequest(room_name="my-room",layout="speaker",preset=api.EncodingOptionsPreset.H264_720P_30,audio_only=False,segment_outputs=[api.SegmentedFileOutput(filename_prefix="my-output",playlist_name="my-playlist.m3u8",live_playlist_name="my-live-playlist.m3u8",segment_duration=5,gcp=api.GCPUpload(credentials=file_contents,bucket="<my-bucket>",),)],)lkapi = api.LiveKitAPI()res = await lkapi.egress.start_room_composite_egress(req)initial_ctx = llm.ChatContext().append(role="system",text=("You are a voice assistant created by LiveKit. Your interface with users will be voice. ""You should use short and concise responses, and avoiding usage of unpronouncable punctuation. ""You were created as a demo to showcase the capabilities of LiveKit's agents framework."),)logger.info(f"connecting to room {ctx.room.name}")await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)# Wait for the first participant to connectparticipant = await ctx.wait_for_participant()logger.info(f"starting voice assistant for participant {participant.identity}")# This project is configured to use Deepgram STT, OpenAI LLM and TTS plugins# Other great providers exist like Cartesia and ElevenLabs# Learn more and pick the best one for your app:# https://docs.livekit.io/agents/pluginsagent = VoicePipelineAgent(vad=ctx.proc.userdata["vad"],stt=deepgram.STT(),llm=openai.LLM(model="gpt-4o-mini"),tts=openai.TTS(),chat_ctx=initial_ctx,)agent.start(ctx.room, participant)# The agent should be polite and greet the user when it joins :)await agent.say("Hey, how can I help you today?", allow_interruptions=True)await lkapi.aclose()Start the agent:
python3 agent.py devRecording starts when a participant joins a room and the agent is dispatched to that room. After the participant leaves the room, the recording stops. Files are uploaded to storage as they're recorded.
Transcriptions
This section describes creating a text log of a conversation by the agent process (that is, server side). For transcriptions for your frontend applications, see Transcriptions.
You can save the text of a conversation with an AI voice agent by listening for agent events and logging user and agent speech to a text file. For example, log messages when user speech is committed (user_speech_committed
) and when the agent stops speaking (agent_stopped_speaking
).
For a list of events emitted by agents, see the following topics:
For example code in Python, see this example of a MultimodalAgent that saves conversation to a text file.