Capturing metrics | LiveKit Docs

Overview

For increased observability into the performance and model usage by your agent, you can enable and log detailed metrics that are provided by LiveKit Agents. These metrics provide detailed insights into the duration, latency, and usage across the stages of a session and are provided for both VoicePipelineAgent and MultimodalAgent.

Logging events

Agent metrics events are fired by LiveKit Agents whenever there is a new metrics object available during an active session.

In order to capture newly available metrics objects in your agent, import the metrics module from LiveKit Agents and subscribe to the metrics_collected event. When your agent receives this event, log the available metrics in your agent.

The metrics module includes a simple helper function which formats logging output based upon the type of metrics received in the event. Call the log_metrics helper function when you receive a new metrics_collected event in order to utilize this formatting for your logs.

# The metrics module is required to capture agent metrics
from livekit.agents import metrics

# Subscribe to metrics collection events and process accordingly
@agent.on("metrics_collected")
def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
    # Use this helper to format and log based on metrics type
    metrics.log_metrics(mtrcs)

Aggregating metrics

The metrics module also includes a helper class that can be used to aggregate usage metrics over the course of a session and generate a summary once the session is complete. Use the UsageCollector class

# Use the usage collector to aggregate agent usage metrics
usage_collector = metrics.UsageCollector()

# Add metrics to usage collector as they are received
@agent.on("metrics_collected")
def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
    # Pass the latest usage metrics to the usage collector for aggregation
    usage_collector.collect(mtrcs)

# Log aggregated summary of usage metrics generated by usage collector
async def log_usage():
    summary = usage_collector.get_summary()
    logger.info(f"Usage: ${summary}")

# At shutdown, generate and log the summary from the usage collector
ctx.add_shutdown_callback(log_usage)

Metrics reference

Diagram where metrics are measured inside of a VoicePipelineAgent.

Note

The following metric types are available for VoicePipelineAgent.

Speech-to-text (STT)

STT metrics events are reported by the metrics module when the STT model being used by the agent has generated output.

Metric	Description
`audio_duration`	The duration (seconds) of the audio input received by the STT model.
`duration`	The total amount of time (seconds) that the connection has been open with the STT provider.

LLM

LLM metrics events are reported by the metrics module when the LLM being used by the agent has generated a completion.

Metric	Description
`ttft`	Time to first token. The amount of time (seconds) that it took for the LLM to generate the first token of the completion.
`input_tokens`	The number of tokens provided in the prompt sent to the LLM.
`output_tokens`	The number of tokens generated by the LLM in the completion.
`tokens_per_second`	The rate of token generation (tokens/second) by the LLM to generate the completion.

Text-to-speech (TTS)

TTS metrics events are reported by the metrics module when the TTS model being used by the agent has generated output.

Metric	Description
`ttfb`	Time to first byte. The amount of time (seconds) that it took for the TTS model to generate the first byte of its audio output.
`audio_duration`	The duration (seconds) of the audio output generated by the TTS model.

End-of-utterance (EOU)

EOU metrics events are reported by the metrics module when the agent is about to play speech back to the user.

Metric	Description
`end_of_utterance_delay`	Total amount of time (seconds) between when VAD detected end of speech and when LLM inference was performed.
`transcription_delay`	The amount of time (seconds) for the STT model to generate a final transcript.

Measuring conversation latency

Total conversation latency is defined as the time it takes for the agent to respond to a user's utterance. Given the metrics above, it can be computed as follows:

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb