Overview
LiveKit Agents supports images and video as both input and output modalities. On the input side, you can add images to your agent's chat context, receive images from the frontend, sample video frames, or enable live video input with a supported realtime model. On the output side, you can send images to the frontend using byte streams or add a virtual avatar for lifelike video output.
In this section
This page provides an overview of image and video capabilities. The following pages in this section cover each topic in detail:
| Topic | Description |
|---|---|
| Images | Add images to your agent's context, receive images from the frontend, and send images back to users. |
| Video | Sample video frames, enable live video input, and add virtual avatars for video output. |
Additional resources
Voice AI quickstart
Use the quickstart as a starting base for adding vision code.
Byte streams
Send and receive images and files with byte streams.
Virtual avatar models
Detailed setup guides for each avatar provider.
Frontend avatars
Build frontends that render avatar video.
Gemini Vision Assistant
Camera and microphone
Publish camera and microphone tracks from your frontend.