Overview
LiveKit Agents supports text inputs and outputs in addition to audio, based on the text streams feature of the LiveKit SDKs. This guide explains what's possible and how to use it in your app.
Transcriptions
When an agent performs STT as part of its processing pipeline, the transcriptions are also published to the frontend in realtime. Additionally, a text representation of the agent speech is also published in sync with audio playback when the agent speaks. These features are both enabled by default when using AgentSession
.
Transcriptions use the lk.transcription
text stream topic. They include a lk.transcribed_track_id
attribute and the sender identity is the transcribed participant.
To disable transcription output, set transcription_enabled=False
in RoomOutputOptions
.
Text input
Your agent also monitors the lk.chat
text stream topic for incoming text messages from its linked participant. The agent interrupts its current speech, if any, to process the message and generate a new response.
To disable text input, set text_enabled=False
in RoomInputOptions
.
Text-only output
To disable audio output entirely and send text only, set audio_enabled=False
in RoomOutputOptions
. The agent will publish text responses to the lk.transcription
text stream topic, without a lk.transcribed_track_id
attribute and without speech synchronization.
Usage examples
This section contains small code samples demonstrating how to use the text features.
For more information, see the text streams documentation. For more complete examples, see the recipes collection.
Frontend integration
Use the registerTextStreamHandler
method to receive incoming transcriptions or text:
Use the sendText
method to send text messages:
Configuring input/output options
The AgentSession constructor accepts configuration for input and output options:
session = AgentSession(..., # STT, LLM, etc.room_input_options=RoomInputOptions(text_enabled=False # disable text input),room_output_options=RoomOutputOptions(audio_enabled=False # disable audio output))
Manual text input
To insert text input and generate a response, use the generate_reply
method of AgentSession: session.generate_reply(input_text="...")
.
Custom topics
You may override the text_input_topic
of RoomInputOptions
and transcription_output_topic
of RoomOutputOptions
to set a custom text stream topic for text input or output, if desired. The default values are lk.chat
and lk.transcription
respectively.
Transcription events
Frontend SDKs can also receive transcription events via RoomEvent.TranscriptionReceived
.
Transcription events will be removed in a future version. Use text streams on the lk.chat
topic instead.