Capturing metrics

Log performance and usage metrics on your agent for debugging and insights.

Overview

For increased observability into the performance and model usage by your agent, you can enable and log detailed metrics that are provided by LiveKit Agents. These metrics provide detailed insights into the duration, latency, and usage across the stages of a session and are provided for both VoicePipelineAgent and MultimodalAgent.

Logging events

Agent metrics events are fired by LiveKit Agents whenever there is a new metrics object available during an active session.

In order to capture newly available metrics objects in your agent, import the metrics module from LiveKit Agents and subscribe to the metrics_collected event. When your agent receives this event, log the available metrics in your agent.

The metrics module includes a simple helper function which formats logging output based upon the type of metrics received in the event. Call the log_metrics helper function when you receive a new metrics_collected event in order to utilize this formatting for your logs.

# The metrics module is required to capture agent metrics
from livekit.agents import metrics
# Subscribe to metrics collection events and process accordingly
@agent.on("metrics_collected")
def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
# Use this helper to format and log based on metrics type
metrics.log_metrics(mtrcs)

Aggregating metrics

The metrics module also includes a helper class that can be used to aggregate usage metrics over the course of a session and generate a summary once the session is complete. Use the UsageCollector class

# Use the usage collector to aggregate agent usage metrics
usage_collector = metrics.UsageCollector()
# Add metrics to usage collector as they are received
@agent.on("metrics_collected")
def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
# Pass the latest usage metrics to the usage collector for aggregation
usage_collector.collect(mtrcs)
# Log aggregated summary of usage metrics generated by usage collector
async def log_usage():
summary = usage_collector.get_summary()
logger.info(f"Usage: ${summary}")
# At shutdown, generate and log the summary from the usage collector
ctx.add_shutdown_callback(log_usage)

Metrics reference

Diagram where metrics are measured inside of a VoicePipelineAgent.
Note

The following metric types are available for VoicePipelineAgent.

Speech-to-text (STT)

STT metrics events are reported by the metrics module when the STT model being used by the agent has generated output.

MetricDescription
audio_durationThe duration (seconds) of the audio input received by the STT model.
durationThe total amount of time (seconds) that the connection has been open with the STT provider.

LLM

LLM metrics events are reported by the metrics module when the LLM being used by the agent has generated a completion.

MetricDescription
ttftTime to first token. The amount of time (seconds) that it took for the LLM to generate the first token of the completion.
input_tokensThe number of tokens provided in the prompt sent to the LLM.
output_tokensThe number of tokens generated by the LLM in the completion.
tokens_per_secondThe rate of token generation (tokens/second) by the LLM to generate the completion.

Text-to-speech (TTS)

TTS metrics events are reported by the metrics module when the TTS model being used by the agent has generated output.

MetricDescription
ttfbTime to first byte. The amount of time (seconds) that it took for the TTS model to generate the first byte of its audio output.
audio_durationThe duration (seconds) of the audio output generated by the TTS model.

End-of-utterance (EOU)

EOU metrics events are reported by the metrics module when the agent is about to playout speech back to the user.

MetricDescription
end_of_utterance_delayTotal amount of time (seconds) between when VAD detected end of speech and when LLM inference was performed.
transcription_delayThe amount of time (seconds) for the STT model to generate a final transcript.

Measuring conversation latency

Total conversation latency is defined as the time it takes for the agent to respond to a user's utterance. Given the metrics above, it can be computed as follows:

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb