Images and video | LiveKit Documentation

Overview

LiveKit Agents supports images and video as both input and output modalities. On the input side, you can add images to your agent's chat context, receive images from the frontend, sample video frames, or enable live video input with a supported realtime model. On the output side, you can send images to the frontend using byte streams or add a virtual avatar for lifelike video output.

In this section

This page provides an overview of image and video capabilities. The following pages in this section cover each topic in detail:

Topic	Description
Images	Add images to your agent's context, receive images from the frontend, and send images back to users.
Video	Sample video frames, enable live video input, and add virtual avatars for video output.

Additional resources

Voice AI quickstart

Use the quickstart as a starting base for adding vision code.

Byte streams

Send and receive images and files with byte streams.

Virtual avatar models

Detailed setup guides for each avatar provider.

Frontend avatars

Build frontends that render avatar video.

Gemini Vision Assistant

A voice AI agent with video input powered by Gemini Live.

Camera and microphone

Publish camera and microphone tracks from your frontend.