Recording agent sessions

There are multiple reasons you might want to record AI agent conversations with users. These reasons might include monitoring agent performance, evaluating customer satisfaction, or for regulatory compliance. LiveKit allows you to record video and audio of AI agent conversations or save conversations as text transcripts.

Video or audio recording

LiveKit's Egress feature provides flexible options for recording audio and/or video. Start a room composite recorder in your agent's entrypoint to record a session. Room recording starts when the agent enters a room with a user and captures all the participants and interactions in a LiveKit room. Recording ends when all particpants leave and the room is closed.

LiveKit egress requires access to a cloud storage provider to upload the recording files. The following example uses Google Cloud Storage, but you can also save files to any Amazon S3-compatible storage provider or Azure Blob Storage.

Example

This example uses the VoicePipelineAgent template as a starting point.

Clone the repo or run the following LiveKit CLI command:
```
lk app create --template=voice-pipeline-agent-python my-recording-app
```
After you run the command, follow the instructions in the command output to finish setup.
Update the agent.py file to import livekit.api:
```
from livekit import api
```

Update the entrypoint function to add room recording.

This example uses Google Cloud Storage. The credentials.json file includes authentication credentials for accessing the bucket. For additional egress examples using Amazon S3 and Azure, see the Egress examples. To learn more about credentials.json, see Cloud storage configurations.

Replace <my-bucket> and update the entrypoint function with the following:

Note

To record only audio, update the audio_only parameter to True and remove the preset parameter.

async def entrypoint(ctx: JobContext):

    # Get GCP credentials from credentials.json file.
    file_contents = ""
    with open("/path/to/credentials.json", "r") as f:
      file_contents = f.read()

    # Set up recording
    req = api.RoomCompositeEgressRequest(
        room_name="my-room",
        layout="speaker",
        preset=api.EncodingOptionsPreset.H264_720P_30,
        audio_only=False,
        segment_outputs=[api.SegmentedFileOutput(
            filename_prefix="my-output",
            playlist_name="my-playlist.m3u8",
            live_playlist_name="my-live-playlist.m3u8",
            segment_duration=5,
            gcp=api.GCPUpload(
                credentials=file_contents,
                bucket="<my-bucket>",
            ),
        )],
    )
    lkapi = api.LiveKitAPI()
    res = await lkapi.egress.start_room_composite_egress(req)

    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are a voice assistant created by LiveKit. Your interface with users will be voice. "
            "You should use short and concise responses, and avoiding usage of unpronouncable punctuation. "
            "You were created as a demo to showcase the capabilities of LiveKit's agents framework."
        ),
    )

    logger.info(f"connecting to room {ctx.room.name}")
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # Wait for the first participant to connect
    participant = await ctx.wait_for_participant()
    logger.info(f"starting voice assistant for participant {participant.identity}")

    # This project is configured to use Deepgram STT, OpenAI LLM and TTS plugins
    # Other great providers exist like Cartesia and ElevenLabs
    # Learn more and pick the best one for your app:
    # https://docs.livekit.io/agents/plugins
    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
    )

    agent.start(ctx.room, participant)

    # The agent should be polite and greet the user when it joins :)
    await agent.say("Hey, how can I help you today?", allow_interruptions=True)

    await lkapi.aclose()

Start the agent:
```
python3 agent.py dev
```
Recording starts when a participant joins a room and the agent is dispatched to that room. After the participant leaves the room, the recording stops. Files are uploaded to storage as they're recorded.

Transcriptions

This section describes creating a text log of a conversation by the agent process (that is, server side). For transcriptions for your frontend applications, see Transcriptions.

You can save the text of a conversation with an AI voice agent by listening for agent events and logging user and agent speech to a text file. For example, log messages when user speech is committed (user_speech_committed) and when the agent stops speaking (agent_stopped_speaking).

For a list of events emitted by agents, see the following topics:

For example code in Python, see this example of a MultimodalAgent that saves conversation to a text file.