Overview
To improve observability into agent performance and model usage, you can log detailed metrics provided by LiveKit Agents. These metrics offer insights into duration, latency, and usage across different stages of a session.
Logging events
Agent metrics events are fired by the AgentSession
whenever there is a new metrics object available during an active session.
A log_metrics
helper function is also provided to format logging output for each metric type.
from livekit.agents import metrics, MetricsCollectedEvent...@session.on("metrics_collected")def _on_metrics_collected(ev: MetricsCollectedEvent):metrics.log_metrics(ev.metrics)
Aggregating metrics
The metrics
module also includes a UsageCollector
helper class for aggregating usage metrics across a session. It tracks metrics such as LLM, TTS, and STT API usage, which can help estimate session cost.
from livekit.agents import metrics, MetricsCollectedEvent...usage_collector = metrics.UsageCollector()@session.on("metrics_collected")def _on_metrics_collected(ev: MetricsCollectedEvent):usage_collector.collect(ev.metrics)async def log_usage():summary = usage_collector.get_summary()logger.info(f"Usage: {summary}")# At shutdown, generate and log the summary from the usage collectorctx.add_shutdown_callback(log_usage)
Metrics reference
Speech-to-text (STT)
STTMetrics
is emitted after the STT model has processed the audio input. This metrics is only available when a STT component is used, which does not apply to Realtime APIs.
Metric | Description |
---|---|
audio_duration | The duration (seconds) of the audio input received by the STT model. |
duration | For non-streaming STT, the amount of time (seconds) it took to create the transcript. Always 0 for streaming STT. |
streamed | True if the STT is in streaming mode. |
LLM
LLMMetrics
is emitted after each LLM inference completes. If the response includes tool calls, the event does not include the time taken to execute those calls. Each tool call response triggers a separate LLMMetrics
event.
Metric | Description |
---|---|
duration | The amount of time (seconds) it took for the LLM to generate the entire completion. |
completion_tokens | The number of tokens generated by the LLM in the completion. |
prompt_tokens | The number of tokens provided in the prompt sent to the LLM. |
speech_id | An unique identifier representing a turn in the user input. |
total_tokens | Total token usage for the completion. |
tokens_per_second | The rate of token generation (tokens/second) by the LLM to generate the completion. |
ttft | The amount of time (seconds) that it took for the LLM to generate the first token of the completion. |
Text-to-speech (TTS)
TTSMetrics
is emitted after a TTS has generated speech from text input.
Metric | Description |
---|---|
audio_duration | The duration (seconds) of the audio output generated by the TTS model. |
characters_count | The number of characters in the text input to the TTS model. |
duration | The amount of time (seconds) it took for the TTS model to generate the entire audio output. |
ttfb | The amount of time (seconds) that it took for the TTS model to generate the first byte of its audio output. |
speech_id | An identifier linking to a user's turn. |
streamed | True if the TTS is in streaming mode. |
End-of-utterance (EOU)
EOUMetrics
is emitted when the user is determined to have finished speaking. It includes metrics related to end-of-turn detection and transcription latency.
This event is only available in Realtime APIs when turn_detection
is set to either VAD or LiveKit's turn detector plugin. When using server-side turn detection, EOUMetrics is not emitted, as this information is not available.
Metric | Description |
---|---|
end_of_utterance_delay | Time (in seconds) from the end of speech (as detected by VAD) to the point when the user's turn is considered complete. This includes any transcription_delay . |
transcription_delay | Time (seconds) between the end of speech and when final transcript is available |
on_user_turn_completed_delay | Time (in seconds) taken to execute the on_user_turn_completed callback. |
speech_id | A unique identifier indicating the user's turn. |
Measuring conversation latency
Total conversation latency is defined as the time it takes for the agent to respond to a user's utterance. Given the metrics above, it can be computed as follows:
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb