Skip to main content

Building agent frontends

Detailed guides to building great frontends for voice and video AI.

Overview

This section covers the core concepts for building a production-ready agent frontend. Your frontend starts a session, authenticates, and then communicates with the agent through realtime media and data while tracking agent state to drive the UI.

In this section

TopicWhat it coversRole in the flow
Session managementCreating, starting, and ending agent sessions with the Session API.Entry point. Orchestrates token fetching, room connection, and agent dispatch.
AuthenticationGenerating and managing JWT tokens via TokenSource types (sandbox, endpoint, custom, literal).Provides the credentials that sessions use to connect to a room and dispatch an agent.
Agent stateReading agent lifecycle states (connecting, listening, thinking, speaking) and state getters.Drives UI updates so your frontend reflects what the agent is doing at any moment.
Realtime media and dataAudio, video, text streams, byte streams, state synchronization, and RPC.The communication layer between your frontend and the agent during a session.
Virtual avatarsRendering a visual avatar driven by agent audio output.Optional. Adds a visual presence to voice agents using a standard video track.