OpenAI Realtime API Quickstart

Build an AI-powered voice assistant that engages in realtime conversations using the OpenAI Realtime API and LiveKit.

The MultimodalAgent class in the LiveKit Agents framework uses the OpenAI Realtime API for speech-to-speech interactions between AI voice assistants and end users.

If you're not using the OpenAI Realtime API, see the Voice agent with STT, LLM, TTS quickstart.

Prerequisites

Steps

The following steps take you through the process of creating a LiveKit account and using the LiveKit Sandbox to create an agent using the MultimodalAgent class. At the end of the quickstart, you'll have an agent and a frontend client you can use to talk to your agent.

Create a LiveKit account

Create an account or sign in to your LiveKit Cloud account.

Create a LiveKit Sandbox

A sandbox allows you to quickly create and deploy an agent locally, and test the agent using a frontend web client. To create a sandbox, follow these steps:

  1. Sign in to LiveKit Sandbox.

  2. Select Create app for the Voice assistant template.

  3. Follow the Finish setting up your sandbox app instructions provided after you create your sandbox. The instructions include installing the LiveKit CLI and sandbox app.

    After you run the lk app create command, enter the following at the appropriate prompts:

    • For the Select Template prompt, select multimodal-agent-python or multimodal-agent-node.
    • Enter your OpenAI API key at the prompt.
  4. Install dependencies and start your agent:

    cd <your_sandbox_id>
    python3 -m venv venv
    source venv/bin/activate
    python3 -m pip install -r requirements.txt
    python3 agent.py dev

Launch your sandbox and talk to your agent.

  1. Sign in to LiveKit Sandbox
  2. In the Your Sandbox apps section, select Launch for <your_sandbox_id> sandbox.
  3. Select Connect and start a conversation with your agent.

Next steps