Overview
Once a session is established and the agent has joined the room, your frontend and agent communicate over WebRTC. LiveKit's transport layer handles two broad categories of realtime communication:
- Media: Audio and video tracks for continuous streams like microphone input and agent speech output.
- Data: Text streams, byte streams, RPC, state synchronization, and data packets for everything else — transcriptions, files, method calls, and shared state.
A simple voice agent uses media tracks for audio and text streams for transcriptions. A more complex agent might add byte streams for image sharing, RPC for triggering actions, and state synchronization for custom UI state. The sections below walk through each of these and link to the full transport documentation.
Media tracks
Your agent can subscribe to the user's microphone and camera tracks, and publish its own audio and video. A simple voice agent subscribes to the user's microphone track and publishes its own audio. A more complex agent with vision capabilities can subscribe to video from the user's camera or shared screen.
Text and transcriptions
Text transcriptions of agent and user speech are available as text streams. You can use these to build chat interfaces, display captions, or process conversation history.
Data sharing
Share images, files, or any other kind of data between your frontend and your agent using byte streams or data packets.
Byte streams
Send and receive images, files, or any other data.
Data packets
Low-level API for sending and receiving any kind of data.
Agent state
As media and data flow between your frontend and agent, the agent moves through a lifecycle of states — connecting, listening, thinking, speaking, and eventually disconnecting. Your frontend can read these states to drive the UI, for example showing a visual indicator when the agent is thinking or enabling the microphone when the agent is ready to listen. For most UI decisions, use state getters like canListen and isFinished rather than raw state values.
State and control
Beyond built-in agent state, your agent and your frontend can share custom state and call methods on each other. Use state synchronization for key-value data that stays in sync across participants, and RPC for request-response interactions like triggering an agent action or fetching data on demand.