Overview
This section covers the core concepts for building a production-ready agent frontend. Your frontend starts a session, authenticates, and then communicates with the agent through realtime media and data while tracking agent state to drive the UI.
In this section
| Topic | What it covers | Role in the flow |
|---|---|---|
| Session management | Creating, starting, and ending agent sessions with the Session API. | Entry point. Orchestrates token fetching, room connection, and agent dispatch. |
| Authentication | Generating and managing JWT tokens via TokenSource types (sandbox, endpoint, custom, literal). | Provides the credentials that sessions use to connect to a room and dispatch an agent. |
| Agent state | Reading agent lifecycle states (connecting, listening, thinking, speaking) and state getters. | Drives UI updates so your frontend reflects what the agent is doing at any moment. |
| Realtime media and data | Audio, video, text streams, byte streams, state synchronization, and RPC. | The communication layer between your frontend and the agent during a session. |
| Virtual avatars | Rendering a visual avatar driven by agent audio output. | Optional. Adds a visual presence to voice agents using a standard video track. |