Skip to main content

Data hooks

Collect session recordings, transcripts, metrics, and other data within the LiveKit Agents SDK.

Overview

The LiveKit Agents SDK includes access to extensive detail about each session, which you can collect locally and integrate with other systems. For information about data collected in LiveKit Cloud, see the Insights in LiveKit Cloud overview. To choose which observability data is collected per session (audio, transcript, traces, logs), see Session recording options.

Session transcripts and reports

Session transcripts, logs, and history are available in the Agent insights tab for each session. It provides a unified timeline that combines turn-by-turn transcripts (including tool calls and handoffs), traces capturing the execution flow of each stage in the voice pipeline, runtime logs from the agent server, and audio recordings that you can play back or download directly in the browser. All of this data streams in realtime during the session, with transcripts and recordings uploaded once the session completes.

If you need to collect data locally, you can use the following to build live dashboards, save conversation history, or create a detailed session report:

  • The session.history object contains the full conversation. Use this to persist a transcript after the session ends.
  • SDKs emit events as turns progress, for example, conversation_item_added and user_input_transcribed. Use these to build live dashboards.
  • A session report gathers identifiers, history, events, and recording metadata in one JSON payload. Use this to create a structured post-session artifact.

Conversation history

The session.history object contains the full conversation. While you can use it to persist a transcript after the session ends, it's an advanced use case and not recommended for most applications.

Realtime model transcript delays

When using a realtime model without a separate STT plugin, session.history transcripts might be incomplete or arrive after the agent has already responded. For details and workarounds, see Delayed transcription.

Instead, view the conversation history in the Agent insights tab for each session. It includes turn-by-turn transcripts, tool calls, handoffs, audio recordings, and more. The following screenshot shows a portion of a conversation history in Agent insights with a tool call:

Conversation history in Agent insights.

To create a live dashboard or collect conversation history as it happens, subscribe to the conversation_item_added event. For more information, see conversation_item_added.

For a Python example using session.history, see the session close callback example in the GitHub repository.

Session reports

Call ctx.make_session_report() inside the on_session_end callback to capture a structured SessionReport with identifiers, conversation history, events, recording metadata, and agent configuration.

import json
from datetime import datetime
from livekit.agents import JobContext, AgentServer
server = AgentServer()
async def on_session_end(ctx: JobContext) -> None:
report = ctx.make_session_report()
report_dict = report.to_dict()
current_date = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"/tmp/session_report_{ctx.room.name}_{current_date}.json"
with open(filename, 'w') as f:
json.dump(report_dict, f, indent=2)
print(f"Session report for {ctx.room.name} saved to {filename}")
@server.rtc_session(agent_name="my-agent", on_session_end=on_session_end)
async def entrypoint(ctx: JobContext):
await ctx.connect()
# ...
import { defineAgent, type JobContext } from '@livekit/agents';
import { writeFile } from 'node:fs/promises';
const onSessionEnd = async (ctx: JobContext) => {
const report = ctx.makeSessionReport();
const currentDate = new Date().toISOString().replace(/[:.]/g, '-').slice(0, -5);
const filename = `/tmp/session_report_${ctx.room.name}_${currentDate}.json`;
await writeFile(filename, JSON.stringify(report, null, 2));
console.log(`Session report for ${ctx.room.name} saved to ${filename}`);
};
export default defineAgent({
entry: async (ctx: JobContext) => {
await ctx.connect();
// ...
ctx.addShutdownCallback(async () => {
await onSessionEnd(ctx);
});
},
});

The report includes fields such as:

  • Job, room, and participant identifiers
  • Complete conversation history with timestamps
  • All session events (transcription, speech detection, handoffs, etc.)
  • Audio recording metadata and paths (when recording is enabled)
  • Agent session options and configuration
Note

The per-message llm_node_ttft and tts_node_ttfb fields in session reports are only populated by the STT-LLM-TTS pipeline. These fields are always empty when using a realtime model.

Session report lifecycle

The SDK calls your on_session_end callback after the voice pipeline closes. At this point, session.history and all metrics are finalized. After your callback returns, the SDK uploads its own telemetry and cleans up resources.

  1. The agent connects to the room and begins the voice pipeline.
  2. When the session ends (for example, the participant disconnects), the SDK fires the on_session_end callback.
  3. Inside on_session_end, call ctx.make_session_report() to collect all session data into a single SessionReport object.
  4. After on_session_end returns, the SDK flushes telemetry to LiveKit Cloud (traces, logs, recordings) and cleans up resources.
Session end timeout

session_end_timeout (default 5 minutes) bounds how long your on_session_end callback can run. If your post-session work (such as writing a report or calling an external API) might exceed this limit, increase session_end_timeout in your WorkerOptions. The separate shutdown_process_timeout (default 10 seconds) bounds the overall job process shutdown after all callbacks complete. See the JobContext reference for details.

Record audio or video

Audio recordings are automatically collected and uploaded to LiveKit Cloud for each session. These files are recorded after background voice cancellation (BVC) is applied and are available for playback and download on the Agent insights tab for the session.

If you need to have more fine-grained control over audio recordings and don't require BVC, or want to record both audio and video, you can use LiveKit Egress to capture audio and video directly to your storage provider. The simplest pattern is to start a room composite recorder when your agent joins the room.

from livekit import api
async def entrypoint(ctx: JobContext):
req = api.RoomCompositeEgressRequest(
room_name=ctx.room.name,
audio_only=True,
file_outputs=[
api.EncodedFileOutput(
file_type=api.EncodedFileType.OGG,
filepath="livekit/my-room-test.ogg",
s3=api.S3Upload(
bucket=os.getenv("AWS_BUCKET_NAME"),
region=os.getenv("AWS_REGION"),
access_key=os.getenv("AWS_ACCESS_KEY_ID"),
secret=os.getenv("AWS_SECRET_ACCESS_KEY"),
),
)
],
)
lkapi = api.LiveKitAPI()
await lkapi.egress.start_room_composite_egress(req)
await lkapi.aclose()
# ... continue with your agent logic
import {
EgressClient,
EncodedFileOutput,
EncodedFileType,
EncodingOptionsPreset,
} from 'livekit-server-sdk';
const egressClient = new EgressClient(
process.env.LIVEKIT_URL.replace('wss://', 'https://'),
process.env.LIVEKIT_API_KEY,
process.env.LIVEKIT_API_SECRET,
);
const output = new EncodedFileOutput({
fileType: EncodedFileType.MP4,
filepath: 'livekit/my-room-test.mp4',
output: {
case: 's3',
value: {
accessKey: process.env.AWS_ACCESS_KEY_ID,
secret: process.env.AWS_SECRET_ACCESS_KEY,
bucket: process.env.AWS_BUCKET_NAME,
region: process.env.AWS_REGION,
forcePathStyle: true,
},
},
});
export default defineAgent({
entry: async (ctx: JobContext) => {
await egressClient.startRoomCompositeEgress(
ctx.room.name ?? 'open-room',
output,
{
layout: 'grid',
encodingOptions: EncodingOptionsPreset.H264_1080P_30,
audioOnly: false,
},
);
// ... continue with your agent logic
},
});

Metrics and usage data

The Agents SDK provides two surfaces for collecting metrics and usage data from your sessions:

Both surfaces are also included in LiveKit Cloud Agent insights.

Per-turn latency

Every ChatMessage in the conversation history includes a metrics field containing a MetricsReport with latency measurements for that turn. The available fields depend on the message role.

User messages

FieldDescription
transcription_delayTime (in seconds) to obtain the transcript after the user stopped speaking.
end_of_turn_delayTime (in seconds) between end of speech and the decision to end the user's turn.
on_user_turn_completed_delayTime (in seconds) to execute the Agent.on_user_turn_completed callback.
FieldDescription
transcriptionDelayTime (in seconds) to obtain the transcript after the user stopped speaking.
endOfTurnDelayTime (in seconds) between end of speech and the decision to end the user's turn.
onUserTurnCompletedDelayTime (in seconds) to execute the Agent.onUserTurnCompleted callback.

Assistant messages

FieldDescription
llm_node_ttftTime (in seconds) for the LLM to return the first token.
tts_node_ttfbTime (in seconds) for the TTS to return the first audio chunk after receiving the first text token.
e2e_latencyTime (in seconds) from when the user stopped speaking to when the agent began responding.
FieldDescription
llmNodeTtftTime (in seconds) for the LLM to return the first token.
ttsNodeTtfbTime (in seconds) for the TTS to return the first audio chunk after receiving the first text token.
e2eLatencyTime (in seconds) from when the user stopped speaking to when the agent began responding.
Note

llm_node_ttft and tts_node_ttfb are only populated by the STT-LLM-TTS pipeline. These fields are empty when using a realtime model.

Both roles

FieldDescription
started_speaking_atTimestamp when speaking began.
stopped_speaking_atTimestamp when speaking ended.
FieldDescription
startedSpeakingAtTimestamp when speaking began.
stoppedSpeakingAtTimestamp when speaking ended.

Per-turn metrics are available from the conversation history or from the conversation_item_added event. The following example subscribes to the event and logs end-to-end latency:

from livekit.agents import ConversationItemAddedEvent
from livekit.agents.llm import ChatMessage
@session.on("conversation_item_added")
def on_conversation_item_added(ev: ConversationItemAddedEvent):
if not isinstance(ev.item, ChatMessage):
return
m = ev.item.metrics
if ev.item.role == "assistant" and m.get("e2e_latency") is not None:
print(f"E2E latency: {m['e2e_latency']:.3f}s")
import { voice } from '@livekit/agents';
session.on(voice.AgentSessionEventTypes.ConversationItemAdded, (ev) => {
const m = ev.item.metrics;
if (ev.item.role === 'assistant' && m?.e2eLatency !== undefined) {
console.log(`E2E latency: ${m.e2eLatency.toFixed(3)}s`);
}
});

Session usage

Subscribe to the session_usage_updated event to receive per-model usage data for cost estimation or billing exports. The event fires whenever new usage data is available during a session.

from livekit.agents import SessionUsageUpdatedEvent
@session.on("session_usage_updated")
def on_session_usage_updated(ev: SessionUsageUpdatedEvent):
for usage in ev.usage.model_usage:
print(f"{usage.provider}/{usage.model}: {usage}")
import { voice } from '@livekit/agents';
session.on(voice.AgentSessionEventTypes.SessionUsageUpdated, (ev) => {
for (const usage of ev.usage.modelUsage) {
console.log(`${usage.provider}/${usage.model}:`, usage);
}
});

You can also access cumulative usage at any time through session.usage:

# ctx is the JobContext from your entrypoint function
async def log_usage():
for usage in session.usage.model_usage:
print(f"{usage.provider}/{usage.model}: {usage}")
ctx.add_shutdown_callback(log_usage)
const logUsage = async () => {
for (const usage of session.usage.modelUsage) {
console.log(`${usage.provider}/${usage.model}:`, usage);
}
};
ctx.addShutdownCallback(logUsage);

Each entry in the model_usage list is a cumulative usage summary for a single model and provider combination. The entry type depends on the pipeline component (LLMModelUsage, TTSModelUsage, STTModelUsage, or InterruptionModelUsage), each with the fields listed in the following sections.

LLMModelUsage

FieldDescription
providerProvider name (for example, openai, anthropic).
modelModel name (for example, gpt-4o, claude-3-5-sonnet).
input_tokensTotal input tokens.
input_cached_tokensInput tokens served from cache.
input_cached_audio_tokensInput audio tokens served from cache (multimodal models).
input_cached_text_tokensInput text tokens served from cache.
input_cached_image_tokensInput image tokens served from cache (multimodal models).
input_audio_tokensInput audio tokens (multimodal models).
input_text_tokensInput text tokens.
input_image_tokensInput image tokens (multimodal models).
output_tokensTotal output tokens.
output_audio_tokensOutput audio tokens (multimodal models).
output_text_tokensOutput text tokens.
session_durationSession connection duration in seconds (for session-based billing).
FieldDescription
providerProvider name (for example, openai, anthropic).
modelModel name (for example, gpt-4o, claude-3-5-sonnet).
inputTokensTotal input tokens.
inputCachedTokensInput tokens served from cache.
inputCachedAudioTokensInput audio tokens served from cache (multimodal models).
inputCachedTextTokensInput text tokens served from cache.
inputCachedImageTokensInput image tokens served from cache (multimodal models).
inputAudioTokensInput audio tokens (multimodal models).
inputTextTokensInput text tokens.
inputImageTokensInput image tokens (multimodal models).
outputTokensTotal output tokens.
outputAudioTokensOutput audio tokens (multimodal models).
outputTextTokensOutput text tokens.
sessionDurationMsSession connection duration in milliseconds (for session-based billing).

TTSModelUsage

FieldDescription
providerProvider name (for example, elevenlabs, cartesia).
modelModel name (for example, eleven_turbo_v2, sonic).
input_tokensInput text tokens (for token-based TTS billing).
output_tokensOutput audio tokens (for token-based TTS billing).
characters_countNumber of characters synthesized (for character-based billing).
audio_durationDuration of generated audio in seconds.
FieldDescription
providerProvider name (for example, elevenlabs, cartesia).
modelModel name (for example, eleven_turbo_v2, sonic).
inputTokensInput text tokens (for token-based TTS billing).
outputTokensOutput audio tokens (for token-based TTS billing).
charactersCountNumber of characters synthesized (for character-based billing).
audioDurationMsDuration of generated audio in milliseconds.

STTModelUsage

FieldDescription
providerProvider name (for example, deepgram, assemblyai).
modelModel name (for example, nova-2, best).
input_tokensInput audio tokens (for token-based STT billing).
output_tokensOutput text tokens (for token-based STT billing).
audio_durationDuration of processed audio in seconds.
FieldDescription
providerProvider name (for example, deepgram, assemblyai).
modelModel name (for example, nova-2, best).
inputTokensInput audio tokens (for token-based STT billing).
outputTokensOutput text tokens (for token-based STT billing).
audioDurationMsDuration of processed audio in milliseconds.
Note

Python durations are in seconds; Node.js durations are in milliseconds.

InterruptionModelUsage

FieldDescription
providerProvider name (for example, livekit).
modelModel name (for example, adaptive).
total_requestsTotal requests sent to the interruption detection model.
FieldDescription
providerProvider name (for example, livekit).
modelModel name (for example, adaptive).
totalRequestsTotal requests sent to the interruption detection model.

Subscribe to metrics events (deprecated)

Deprecated

The session-level metrics_collected event is deprecated. Use session_usage_updated for usage tracking and ChatMessage.metrics for per-turn latency. Per-plugin metrics_collected events are not deprecated.

from livekit.agents import metrics, MetricsCollectedEvent
@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
metrics.log_metrics(ev.metrics)
import { voice, metrics } from '@livekit/agents';
session.on(voice.AgentSessionEventTypes.MetricsCollected, (ev) => {
metrics.logMetrics(ev.metrics);
});

Aggregate usage (deprecated)

Deprecated

UsageCollector and UsageSummary are deprecated. Use session.usage for cumulative per-model usage instead.

from livekit.agents import metrics, MetricsCollectedEvent
@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
metrics.log_metrics(ev.metrics)
async def log_usage():
logger.info(f"Usage: {session.usage}")
ctx.add_shutdown_callback(log_usage)
import { voice, metrics } from '@livekit/agents';
session.on(voice.AgentSessionEventTypes.MetricsCollected, (ev) => {
metrics.logMetrics(ev.metrics);
});
const logUsage = async () => {
console.log(`Usage: ${JSON.stringify(session.usage)}`);
};
ctx.addShutdownCallback(logUsage);

Metrics reference

Each metrics event is included in the LiveKit Cloud trace spans and surfaced as JSON in the dashboard. These metrics are emitted by individual pipeline plugins (STT, LLM, TTS, VAD, etc.) and can be consumed through per-plugin metrics_collected listeners. Use the tables in the following sections when you emit the data elsewhere.

Diagram where metrics are measured.

Voice-activity-detection (VAD)

VADMetrics is emitted periodically by the VAD model as it processes audio. It provides visibility into the VAD's operational performance, including how much time it spends idle versus performing inference operations and how many inference operations it completes. This data can be useful for diagnosing latency in speech turn detection.

MetricDescription
idle_timeThe amount of time (seconds) the VAD spent idle, not performing inference.
inference_duration_totalThe total amount of time (seconds) spent on VAD inference operations.
inference_countThe number of VAD inference operations performed.
MetricDescription
idleTimeMsThe amount of time (milliseconds) the VAD spent idle, not performing inference.
inferenceDurationTotalMsThe total amount of time (milliseconds) spent on VAD inference operations.
inferenceCountThe number of VAD inference operations performed.

Speech-to-text (STT)

STTMetrics is emitted after the STT model processes the audio input. This metrics event is only available when an STT component is configured (Realtime APIs do not emit it).

MetricDescription
audio_durationThe duration (seconds) of the audio input received by the STT model.
durationFor non-streaming STT, the amount of time (seconds) it took to create the transcript. Always 0 for streaming STT.
streamedTrue if the STT is in streaming mode.
MetricDescription
audioDurationMsThe duration (milliseconds) of the audio input received by the STT model.
durationMsFor non-streaming STT, the amount of time (milliseconds) it took to create the transcript. Always 0 for streaming STT.
streamedtrue if the STT is in streaming mode.

End-of-utterance (EOU)

EOUMetrics is emitted when the user is determined to have finished speaking. It includes metrics related to end-of-turn detection and transcription latency.

EOU metrics are available in Realtime APIs when turn_detection is set to VAD or LiveKit's turn detector plugin. When using server-side turn detection, EOUMetrics is not emitted.

MetricDescription
end_of_utterance_delayTime (in seconds) from the end of speech (as detected by VAD) to the point when the user's turn is considered complete. This includes any transcription_delay.
transcription_delayTime (in seconds) between the end of speech and when the final transcript is available.
on_user_turn_completed_delayTime (in seconds) taken to execute the on_user_turn_completed callback.
speech_idA unique identifier indicating the user's turn. Not present when end-of-utterance fires without a detected speech segment.
MetricDescription
endOfUtteranceDelayMsTime (in milliseconds) from the end of speech (as detected by VAD) to the point when the user's turn is considered complete. This includes any transcriptionDelayMs.
transcriptionDelayMsTime (milliseconds) between the end of speech and when the final transcript is available.
onUserTurnCompletedDelayMsTime (in milliseconds) taken to invoke the Agent.onUserTurnCompleted callback.
lastSpeakingTimeMsTimestamp (milliseconds) of when the user last stopped speaking.
speechIdA unique identifier indicating the user's turn. Not present when end-of-utterance fires without a detected speech segment.

LLM

LLMMetrics is emitted after each LLM inference completes. Tool calls that run after the initial completion emit their own LLMMetrics events.

MetricDescription
durationThe amount of time (seconds) it took for the LLM to generate the entire completion.
completion_tokensThe number of tokens generated by the LLM in the completion.
prompt_tokensThe number of tokens provided in the prompt sent to the LLM.
prompt_cached_tokensThe number of cached tokens in the input prompt.
speech_idA unique identifier representing a turn in the user input. Not present for proactive agent responses, tool-call follow-ups, or other completions not tied to a user speech turn.
total_tokensTotal token usage for the completion.
tokens_per_secondThe rate of token generation (tokens/second) by the LLM to generate the completion.
ttftThe amount of time (seconds) that it took for the LLM to generate the first token of the completion.
MetricDescription
durationMsThe amount of time (milliseconds) it took for the LLM to generate the entire completion.
completionTokensThe number of tokens generated by the LLM in the completion.
promptTokensThe number of tokens provided in the prompt sent to the LLM.
promptCachedTokensThe number of cached tokens in the input prompt.
speechIdA unique identifier representing a turn in the user input. Not present for proactive agent responses, tool-call follow-ups, or other completions not tied to a user speech turn.
totalTokensTotal token usage for the completion.
tokensPerSecondThe rate of token generation (tokens/second) by the LLM to generate the completion.
ttftMsThe amount of time (milliseconds) that it took for the LLM to generate the first token of the completion.

Realtime model

RealtimeModelMetrics is emitted after each response from a realtime model. It replaces LLMMetrics in agents that use a realtime model instead of an STT-LLM-TTS pipeline.

MetricDescription
durationThe amount of time (seconds) it took to receive the full response from the model.
session_durationThe total connection time (seconds) for session-based billing.
ttftTime to first audio token (seconds). Returns -1 if the model did not generate audio tokens. Unlike LLMMetrics.ttft, this value can be negative.
input_tokensTotal number of input tokens.
output_tokensTotal number of output tokens.
total_tokensTotal token usage for the response.
tokens_per_secondThe rate of output token generation (tokens/second).
input_token_detailsBreakdown of input tokens by modality: audio_tokens, text_tokens, image_tokens, cached_tokens, and cached_tokens_details (further split by modality).
output_token_detailsBreakdown of output tokens by modality: text_tokens, audio_tokens, image_tokens.
MetricDescription
durationMsThe amount of time (milliseconds) it took to receive the full response from the model.
sessionDurationMsThe total connection time (milliseconds) for session-based billing. Not present for providers that don't use session-based billing.
ttftMsTime to first audio token (milliseconds). Returns -1 if the model did not generate audio tokens. Unlike LLMMetrics.ttftMs, this value can be negative.
inputTokensTotal number of input tokens.
outputTokensTotal number of output tokens.
totalTokensTotal token usage for the response.
tokensPerSecondThe rate of output token generation (tokens/second).
inputTokenDetailsBreakdown of input tokens by modality: audioTokens, textTokens, imageTokens, cachedTokens, and cachedTokenDetails (further split by modality).
outputTokenDetailsBreakdown of output tokens by modality: textTokens, audioTokens, imageTokens.

Text-to-speech (TTS)

TTSMetrics is emitted after the TTS model generates speech from text input.

MetricDescription
audio_durationThe duration (seconds) of the audio output generated by the TTS model.
characters_countThe number of characters in the text input to the TTS model.
durationThe amount of time (seconds) it took for the TTS model to generate the entire audio output.
ttfbThe amount of time (seconds) that it took for the TTS model to generate the first byte of its audio output.
speech_idAn identifier linking to a user's turn. Not present for speech synthesized independently of a user turn, such as a proactive greeting or say() call.
streamedTrue if the TTS is in streaming mode.
MetricDescription
audioDurationMsThe duration (milliseconds) of the audio output generated by the TTS model.
charactersCountThe number of characters in the text input to the TTS model.
durationMsThe amount of time (milliseconds) it took for the TTS model to generate the entire audio output.
ttfbMsThe amount of time (milliseconds) that it took for the TTS model to generate the first byte of its audio output.
speechIdAn identifier linking to a user's turn. Not present for speech synthesized independently of a user turn, such as a proactive greeting or say() call.
streamedtrue if the TTS is in streaming mode.

Interruption detection

InterruptionMetrics is emitted when the adaptive interruption model processes overlapping speech. Interruption metrics are only available when the adaptive interruption handling is enabled. Use it to monitor detection latency and request volume for the model.

MetricDescription
total_durationLatest Round Trip Time (RTT) for the inference, in seconds.
prediction_durationLatest time taken for inference on the model side, in seconds.
detection_delayLatest total time from the onset of overlapping speech to the final prediction, in seconds.
num_interruptionsNumber of interruptions detected for this event.
num_backchannelsNumber of non-interrupting speech events (backchannels) detected for this event.
num_requestsNumber of requests sent to the interruption detection model for this event.
MetricDescription
totalDurationLatest Round Trip Time (RTT) for the inference, in milliseconds.
predictionDurationLatest time taken for inference on the model side, in milliseconds.
detectionDelayLatest total time from the onset of overlapping speech to the final prediction, in milliseconds.
numInterruptionsNumber of interruptions detected for this event.
numBackchannelsNumber of non-interrupting speech events (backchannels) detected for this event.
numRequestsNumber of requests sent to the interruption detection model for this event.

Measure conversation latency

Total conversation latency is the time it takes for the agent to respond to a user's utterance. The simplest way to get this is from e2e_latency in ChatMessage.metrics.

For a more granular breakdown, approximate total latency by summing individual pipeline metrics:

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
const totalLatency = eou.endOfUtteranceDelayMs + llm.ttftMs + tts.ttfbMs;

Correlate pipeline metrics by turn

If you need to track latency for individual pipeline stages (EOU, LLM, TTS) separately — for example, to build a per-stage latency dashboard — use speech_id to correlate metrics across events for the same user turn.

Note

Metrics where speech_id is None aren't tied to a user turn (for example, proactive greetings or say() calls). The examples below skip these.

from collections import defaultdict
from livekit.agents import metrics, MetricsCollectedEvent
from livekit.agents.metrics import EOUMetrics, LLMMetrics, TTSMetrics
turn_metrics: dict[str, dict[str, float]] = defaultdict(dict)
@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
metrics.log_metrics(ev.metrics)
m = ev.metrics
sid = getattr(m, "speech_id", None)
if sid is None:
return
if isinstance(m, EOUMetrics):
turn_metrics[sid]["eou_delay"] = m.end_of_utterance_delay
elif isinstance(m, LLMMetrics):
turn_metrics[sid]["llm_ttft"] = m.ttft
elif isinstance(m, TTSMetrics):
turn_metrics[sid]["tts_ttfb"] = m.ttfb
async def log_turn_latencies():
for sid, parts in turn_metrics.items():
total = sum(parts.values())
logger.info(f"Turn {sid}: {parts} total={total:.3f}s")
ctx.add_shutdown_callback(log_turn_latencies)
import { voice, metrics } from '@livekit/agents';
const turnMetrics = new Map<string, Record<string, number>>();
session.on(voice.AgentSessionEventTypes.MetricsCollected, (ev) => {
metrics.logMetrics(ev.metrics);
const m = ev.metrics;
const sid = 'speechId' in m ? m.speechId : undefined;
if (!sid) return;
if (!turnMetrics.has(sid)) turnMetrics.set(sid, {});
const parts = turnMetrics.get(sid)!;
if (m.type === 'eou_metrics') {
parts.eouDelay = m.endOfUtteranceDelayMs;
} else if (m.type === 'llm_metrics') {
parts.llmTtft = m.ttftMs;
} else if (m.type === 'tts_metrics') {
parts.ttsTtfb = m.ttfbMs;
}
});
ctx.addShutdownCallback(async () => {
for (const [sid, parts] of turnMetrics) {
const total = Object.values(parts).reduce((a, b) => a + b, 0);
console.log(`Turn ${sid}:`, parts, `total=${total.toFixed(1)}ms`);
}
});

OpenTelemetry integration

Only Available in
Python

Set a tracer provider to export the same spans used by LiveKit Cloud to any OpenTelemetry-compatible backend. The following example sends spans to LangFuse.

import base64
import os
from livekit.agents.telemetry import set_tracer_provider
def setup_langfuse(
host: str | None = None, public_key: str | None = None, secret_key: str | None = None
):
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
public_key = public_key or os.getenv("LANGFUSE_PUBLIC_KEY")
secret_key = secret_key or os.getenv("LANGFUSE_SECRET_KEY")
host = host or os.getenv("LANGFUSE_HOST")
if not public_key or not secret_key or not host:
raise ValueError("LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and LANGFUSE_HOST must be set")
langfuse_auth = base64.b64encode(f"{public_key}:{secret_key}".encode()).decode()
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = f"{host.rstrip('/')}/api/public/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {langfuse_auth}"
trace_provider = TracerProvider()
trace_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
set_tracer_provider(trace_provider)
async def entrypoint(ctx: JobContext):
setup_langfuse()
# start your agent

For an end-to-end script, see the LangFuse trace example on GitHub.

Additional resources