Overview
For increased observability into the performance and model usage by your agent, you can enable and log detailed metrics that are provided by LiveKit Agents. These metrics provide detailed insights into the duration, latency, and usage across the stages of a session and are provided for both VoicePipelineAgent and MultimodalAgent.
Logging events
Agent metrics events are fired by LiveKit Agents whenever there is a new metrics object available during an active session.
In order to capture newly available metrics objects in your agent, import the metrics
module from LiveKit Agents and subscribe to the metrics_collected
event. When your agent receives this event, log the available metrics in your agent.
The metrics
module includes a simple helper function which formats logging output based upon the type of metrics received in the event. Call the log_metrics
helper function when you receive a new metrics_collected
event in order to utilize this formatting for your logs.
# The metrics module is required to capture agent metricsfrom livekit.agents import metrics# Subscribe to metrics collection events and process accordingly@agent.on("metrics_collected")def _on_metrics_collected(mtrcs: metrics.AgentMetrics):# Use this helper to format and log based on metrics typemetrics.log_metrics(mtrcs)
Aggregating metrics
The metrics
module also includes a helper class that can be used to aggregate usage metrics over the course of a session and generate a summary once the session is complete. Use the UsageCollector
class
# Use the usage collector to aggregate agent usage metricsusage_collector = metrics.UsageCollector()# Add metrics to usage collector as they are received@agent.on("metrics_collected")def _on_metrics_collected(mtrcs: metrics.AgentMetrics):# Pass the latest usage metrics to the usage collector for aggregationusage_collector.collect(mtrcs)# Log aggregated summary of usage metrics generated by usage collectorasync def log_usage():summary = usage_collector.get_summary()logger.info(f"Usage: ${summary}")# At shutdown, generate and log the summary from the usage collectorctx.add_shutdown_callback(log_usage)
Metrics reference
The following metric types are available for VoicePipelineAgent
.
Speech-to-text (STT)
STT metrics events are reported by the metrics
module when the STT model being used by the agent has generated output.
Metric | Description |
---|---|
audio_duration | The duration (seconds) of the audio input received by the STT model. |
duration | The total amount of time (seconds) that the connection has been open with the STT provider. |
LLM
LLM metrics events are reported by the metrics
module when the LLM being used by the agent has generated a completion.
Metric | Description |
---|---|
ttft | Time to first token. The amount of time (seconds) that it took for the LLM to generate the first token of the completion. |
input_tokens | The number of tokens provided in the prompt sent to the LLM. |
output_tokens | The number of tokens generated by the LLM in the completion. |
tokens_per_second | The rate of token generation (tokens/second) by the LLM to generate the completion. |
Text-to-speech (TTS)
TTS metrics events are reported by the metrics
module when the TTS model being used by the agent has generated output.
Metric | Description |
---|---|
ttfb | Time to first byte. The amount of time (seconds) that it took for the TTS model to generate the first byte of its audio output. |
audio_duration | The duration (seconds) of the audio output generated by the TTS model. |
End-of-utterance (EOU)
EOU metrics events are reported by the metrics
module when the agent is about to playout speech back to the user.
Metric | Description |
---|---|
end_of_utterance_delay | Total amount of time (seconds) between when VAD detected end of speech and when LLM inference was performed. |
transcription_delay | The amount of time (seconds) for the STT model to generate a final transcript. |
Measuring conversation latency
Total conversation latency is defined as the time it takes for the agent to respond to a user's utterance. Given the metrics above, it can be computed as follows:
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb