Skip to main content

Observability data hooks

Collect session recordings, transcripts, metrics, and other data within the LiveKit Agents SDK.

Overview

The LiveKit Agents SDK includes access to extensive detail about each session, which you can collect locally and integrate with other systems. For information about data collected in LiveKit Cloud, see the Insights in LiveKit Cloud overview.

Metrics and usage data

AgentSession emits a metrics_collected event whenever new metrics are available. You can log these events directly or forward them to external services.

Subscribe to metrics events

from livekit.agents import metrics, MetricsCollectedEvent
@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
metrics.log_metrics(ev.metrics)
import { voice, metrics } from '@livekit/agents';
session.on(voice.AgentSessionEventTypes.MetricsCollected, (ev) => {
metrics.logMetrics(ev.metrics);
});

Aggregate usage with UsageCollector

Use UsageCollector to accumulate LLM, TTS, and STT usage across a session for cost estimation or billing exports.

from livekit.agents import metrics, MetricsCollectedEvent
usage_collector = metrics.UsageCollector()
@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
usage_collector.collect(ev.metrics)
async def log_usage():
summary = usage_collector.get_summary()
logger.info(f"Usage: {summary}")
ctx.add_shutdown_callback(log_usage)
import { voice, metrics } from '@livekit/agents';
const usageCollector = new metrics.UsageCollector();
session.on(voice.AgentSessionEventTypes.MetricsCollected, (ev) => {
metrics.logMetrics(ev.metrics);
usageCollector.collect(ev.metrics);
});
const logUsage = async () => {
const summary = usageCollector.getSummary();
console.log(`Usage: ${JSON.stringify(summary)}`);
};
ctx.addShutdownCallback(logUsage);

Metrics reference

Each metrics event is included in the LiveKit Cloud trace spans and surfaced as JSON in the dashboard. Use the tables below when you emit the data elsewhere.

Diagram where metrics are measured.

Speech-to-text (STT)

STTMetrics is emitted after the STT model processes the audio input. This metrics event is only available when an STT component is configured (Realtime APIs do not emit it).

MetricDescription
audio_durationThe duration (seconds) of the audio input received by the STT model.
durationFor non-streaming STT, the amount of time (seconds) it took to create the transcript. Always 0 for streaming STT.
streamedTrue if the STT is in streaming mode.

LLM

LLMMetrics is emitted after each LLM inference completes. Tool calls that run after the initial completion emit their own LLMMetrics events.

MetricDescription
durationThe amount of time (seconds) it took for the LLM to generate the entire completion.
completion_tokensThe number of tokens generated by the LLM in the completion.
prompt_tokensThe number of tokens provided in the prompt sent to the LLM.
prompt_cached_tokensThe number of cached tokens in the input prompt.
speech_idA unique identifier representing a turn in the user input.
total_tokensTotal token usage for the completion.
tokens_per_secondThe rate of token generation (tokens/second) by the LLM to generate the completion.
ttftThe amount of time (seconds) that it took for the LLM to generate the first token of the completion.

Text-to-speech (TTS)

TTSMetrics is emitted after the TTS model generates speech from text input.

MetricDescription
audio_durationThe duration (seconds) of the audio output generated by the TTS model.
characters_countThe number of characters in the text input to the TTS model.
durationThe amount of time (seconds) it took for the TTS model to generate the entire audio output.
ttfbThe amount of time (seconds) that it took for the TTS model to generate the first byte of its audio output.
speech_idAn identifier linking to a user's turn.
streamedTrue if the TTS is in streaming mode.

End-of-utterance (EOU)

EOUMetrics is emitted when the user is determined to have finished speaking. It includes metrics related to end-of-turn detection and transcription latency.

EOU metrics are available in Realtime APIs when turn_detection is set to VAD or LiveKit's turn detector plugin. When using server-side turn detection, EOUMetrics is not emitted.

MetricDescription
end_of_utterance_delayTime (in seconds) from the end of speech (as detected by VAD) to the point when the user's turn is considered complete. This includes any transcription_delay.
transcription_delayTime (seconds) between the end of speech and when the final transcript is available.
on_user_turn_completed_delayTime (in seconds) taken to execute the on_user_turn_completed callback.
speech_idA unique identifier indicating the user's turn.

Measure conversation latency

Total conversation latency is the time it takes for the agent to respond to a user's utterance. Approximate it with the following metrics:

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
const totalLatency = eou.endOfUtteranceDelay + llm.ttft + tts.ttfb;

Session transcripts and reports

The session.history object contains the full conversation, and the SDK raises events like conversation_item_added and user_input_transcribed as turns progress. Use these hooks to build live dashboards or persist transcripts once a session ends. When you need a structured post-session artifact, call ctx.make_session_report() inside on_session_end to gather identifiers, history, events, and recording metadata in one JSON payload.

Save conversation history example

The following Python example augments the Voice AI quickstart to save the transcript as JSON when the session closes.

from datetime import datetime
import json
def entrypoint(ctx: JobContext):
async def write_transcript():
current_date = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"/tmp/transcript_{ctx.room.name}_{current_date}.json"
with open(filename, 'w') as f:
json.dump(session.history.to_dict(), f, indent=2)
print(f"Transcript for {ctx.room.name} saved to {filename}")
ctx.add_shutdown_callback(write_transcript)
# ... continue with ctx.connect(), agent setup, etc.

Capture a session report

Only Available in
Python

Use the on_session_end callback to capture a structured SessionReport with identifiers, conversation history, events, recording metadata, and agent configuration.

import json
from datetime import datetime
from livekit.agents import JobContext, AgentServer
server = AgentServer()
async def on_session_end(ctx: JobContext) -> None:
report = ctx.make_session_report()
report_dict = report.to_dict()
current_date = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"/tmp/session_report_{ctx.room.name}_{current_date}.json"
with open(filename, 'w') as f:
json.dump(report_dict, f, indent=2)
print(f"Session report for {ctx.room.name} saved to {filename}")
@server.rtc_session(on_session_end=on_session_end)
async def entrypoint(ctx: JobContext):
await ctx.connect()
# ...

The report includes fields such as:

  • Job, room, and participant identifiers
  • Complete conversation history with timestamps
  • All session events (transcription, speech detection, handoffs, etc.)
  • Audio recording metadata and paths (when recording is enabled)
  • Agent session options and configuration

Record audio or video

Use LiveKit Egress to capture audio and video directly to your storage provider. The simplest pattern is to start a room composite recorder when your agent joins the room.

from livekit import api
async def entrypoint(ctx: JobContext):
req = api.RoomCompositeEgressRequest(
room_name=ctx.room.name,
audio_only=True,
file_outputs=[
api.EncodedFileOutput(
file_type=api.EncodedFileType.OGG,
filepath="livekit/my-room-test.ogg",
s3=api.S3Upload(
bucket=os.getenv("AWS_BUCKET_NAME"),
region=os.getenv("AWS_REGION"),
access_key=os.getenv("AWS_ACCESS_KEY_ID"),
secret=os.getenv("AWS_SECRET_ACCESS_KEY"),
),
)
],
)
lkapi = api.LiveKitAPI()
await lkapi.egress.start_room_composite_egress(req)
await lkapi.aclose()
# ... continue with your agent logic

OpenTelemetry integration

ONLY Available in
Python

Set a tracer provider to export the same spans used by LiveKit Cloud to any OpenTelemetry-compatible backend. The example below sends spans to LangFuse.

import base64
import os
from livekit.agents.telemetry import set_tracer_provider
def setup_langfuse(
host: str | None = None, public_key: str | None = None, secret_key: str | None = None
):
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
public_key = public_key or os.getenv("LANGFUSE_PUBLIC_KEY")
secret_key = secret_key or os.getenv("LANGFUSE_SECRET_KEY")
host = host or os.getenv("LANGFUSE_HOST")
if not public_key or not secret_key or not host:
raise ValueError("LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and LANGFUSE_HOST must be set")
langfuse_auth = base64.b64encode(f"{public_key}:{secret_key}".encode()).decode()
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = f"{host.rstrip('/')}/api/public/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {langfuse_auth}"
trace_provider = TracerProvider()
trace_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
set_tracer_provider(trace_provider)
async def entrypoint(ctx: JobContext):
setup_langfuse()
# start your agent

For an end-to-end script, see the LangFuse trace example on GitHub.