Realtime media and data | LiveKit Documentation

Overview

Once a session is established and the agent has joined the room, your frontend and agent communicate over WebRTC. LiveKit's transport layer handles two broad categories of realtime communication:

Media: Audio and video tracks for continuous streams like microphone input and agent speech output.
Data: Text streams, byte streams, RPC, state synchronization, and data packets for everything else — transcriptions, files, method calls, and shared state.

A simple voice agent uses media tracks for audio and text streams for transcriptions. A more complex agent might add byte streams for image sharing, RPC for triggering actions, and state synchronization for custom UI state. The sections below walk through each of these and link to the full transport documentation.

Media tracks

Your agent can subscribe to the user's microphone and camera tracks, and publish its own audio and video. A simple voice agent subscribes to the user's microphone track and publishes its own audio. A more complex agent with vision capabilities can subscribe to video from the user's camera or shared screen.

Media tracks

Use the microphone, speaker, cameras, and screenshare with your agent.

Text and transcriptions

Text transcriptions of agent and user speech are available as text streams. You can use these to build chat interfaces, display captions, or process conversation history.

Text streams

Send and receive realtime text and transcriptions.

Data sharing

Share images, files, or any other kind of data between your frontend and your agent using byte streams or data packets.

Byte streams

Send and receive images, files, or any other data.

Data packets

Low-level API for sending and receiving any kind of data.

Agent state

As media and data flow between your frontend and agent, the agent moves through a lifecycle of states — connecting, listening, thinking, speaking, and eventually disconnecting. Your frontend can read these states to drive the UI, for example showing a visual indicator when the agent is thinking or enabling the microphone when the agent is ready to listen. For most UI decisions, use state getters like canListen and isFinished rather than raw state values.

Agent state

Track and respond to agent state changes in your frontend.

State and control

Beyond built-in agent state, your agent and your frontend can share custom state and call methods on each other. Use state synchronization for key-value data that stays in sync across participants, and RPC for request-response interactions like triggering an agent action or fetching data on demand.

State synchronization

Share custom state between your frontend and agent.

RPC

Define and call methods on your agent or your frontend from the other side.