Session recording and transcripts

Overview

There are many reasons to record or persist the sessions that occur in your app, from quality monitoring to regulatory compliance. LiveKit allows you to record the video and audio from agent sessions or save the text transcripts.

Video or audio recording

Use the Egress feature to record audio and/or video. The simplest way to do this is to start a room composite recorder in your agent's entrypoint. This starts recording when the agent enters the room and automatically captures all audio and video shared in the room. Recording ends when all participants leave. Recordings are stored in the cloud storage provider of your choice.

Example

This example shows how to modify the Voice AI quickstart to record sessions. It uses Google Cloud Storage, but you can also save files to any Amazon S3-compatible storage provider or Azure Blob Storage.

For additional egress examples using Amazon S3 and Azure, see the Egress examples. To learn more about credentials.json, see Cloud storage configurations.

To modify the Voice AI quickstart to record sessions, add the following code:

from livekit import api

async def entrypoint(ctx: JobContext):
    # Add the following code to the top, before calling ctx.connect()

    # Load GCP credentials from credentials.json file.
    file_contents = ""
    with open("/path/to/credentials.json", "r") as f:
        file_contents = f.read()

    # Set up recording
    req = api.RoomCompositeEgressRequest(
        room_name="my-room",
        layout="speaker",
        audio_only=True,
        segment_outputs=[api.SegmentedFileOutput(
            filename_prefix="my-output",
            playlist_name="my-playlist.m3u8",
            live_playlist_name="my-live-playlist.m3u8",
            segment_duration=5,
            gcp=api.GCPUpload(
                credentials=file_contents,
                bucket="<your-gcp-bucket>",
            ),
        )],
    )

    res = await ctx.api.egress.start_room_composite_egress(req)

    # .. The rest of your entrypoint code follows ...

Text transcripts

Text transcripts are available in realtime via the llm_node or the transcription_node as detailed in the docs on Pipeline nodes. You can use this along with other events and callbacks to record your session and any other data you need.

Additionally, you can access the session.history property at any time to get the full conversation history so far. Using the add_shutdown_callback method, you can save the conversation history to a file after the user leaves and the room closes.

Example

This example shows how to modify the Voice AI quickstart to save the conversation history to a JSON file.

from datetime import datetime
import json

def entrypoint(ctx: JobContext):
    # Add the following code to the top, before calling ctx.connect()
    
    async def write_transcript():
        current_date = datetime.now().strftime("%Y%m%d_%H%M%S")

        # This example writes to the temporary directory, but you can save to any location
        filename = f"/tmp/transcript_{ctx.room.name}_{current_date}.json"
        
        with open(filename, 'w') as f:
            json.dump(session.history.to_dict(), f, indent=2)
            
        print(f"Transcript for {ctx.room.name} saved to {filename}")

    ctx.add_shutdown_callback(write_transcript)

    # .. The rest of your entrypoint code follows ...