Text and transcriptions

Overview

LiveKit Agents supports text inputs and outputs in addition to audio, based on the text streams feature of the LiveKit SDKs. This guide explains what's possible and how to use it in your app.

Transcriptions

When an agent performs STT as part of its processing pipeline, the transcriptions are also published to the frontend in realtime. Additionally, a text representation of the agent speech is also published in sync with audio playback when the agent speaks. These features are both enabled by default when using AgentSession.

Transcriptions use the lk.transcription text stream topic. They include a lk.transcribed_track_id attribute and the sender identity is the transcribed participant.

To disable transcription output, set transcription_enabled=False in RoomOutputOptions.

Text input

Your agent also monitors the lk.chat text stream topic for incoming text messages from its linked participant. The agent interrupts its current speech, if any, to process the message and generate a new response.

To disable text input, set text_enabled=False in RoomInputOptions.

Text-only output

To disable audio output entirely and send text only, set audio_enabled=False in RoomOutputOptions. The agent will publish text responses to the lk.transcription text stream topic, without a lk.transcribed_track_id attribute and without speech synchronization.

Usage examples

This section contains small code samples demonstrating how to use the text features.

For more information, see the text streams documentation. For more complete examples, see the recipes collection.

Frontend integration

Use the registerTextStreamHandler method to receive incoming transcriptions or text:

Use the sendText method to send text messages:

Configuring input/output options

The AgentSession constructor accepts configuration for input and output options:

session = AgentSession(
    ..., # STT, LLM, etc.
    room_input_options=RoomInputOptions(
        text_enabled=False # disable text input
    ), 
    room_output_options=RoomOutputOptions(
        audio_enabled=False # disable audio output
    )
)

Manual text input

To insert text input and generate a response, use the generate_reply method of AgentSession: session.generate_reply(input_text="...").

Custom topics

You may override the text_input_topic of RoomInputOptions and transcription_output_topic of RoomOutputOptions to set a custom text stream topic for text input or output, if desired. The default values are lk.chat and lk.transcription respectively.

Transcription events

Frontend SDKs can also receive transcription events via RoomEvent.TranscriptionReceived.

Deprecated feature

Transcription events will be removed in a future version. Use text streams on the lk.chat topic instead.