OpenAI Realtime API Quickstart

Build an AI-powered voice assistant that engages in realtime conversations using the OpenAI Realtime API and LiveKit.

The MultimodalAgent class in the LiveKit Agents Framework uses the OpenAI Realtime API for speech-to-speech interactions between AI voice assistants and end users. It is implemented in both our Python and Node Agents Framework libraries.

note

If you're not using the OpenAI Realtime API, see the Voice agent with STT, LLM, TTS quickstart.

Prerequisites

Steps

The following steps take you through the process of creating a LiveKit account and using the LiveKit CLI to create an agent from some minimal templates. At the end of the quickstart, you'll have an agent and a frontend you can use to talk to your agent.

Setup a LiveKit account and install the CLI

  1. Create an account or sign in to your LiveKit Cloud account.

  2. Install the LiveKit CLI and authenticate using lk cloud auth.

Bootstrap an agent from template

  1. Clone a starter template for your preferred language using the CLI:

    lk app create --template multimodal-agent-python
  2. Enter your OpenAI API Key when prompted.

  3. Install dependencies and start your agent:

    cd <agent_dir>
    python3 -m venv venv
    source venv/bin/activate
    python3 -m pip install -r requirements.txt
    python3 agent.py dev

    You can edit the agent.py file to customize the system prompt and other aspects of your agent.

Bootstrap a frontend from template

  1. Clone the Voice Assistant Frontend Next.js app starter template using the CLI:

    lk app create --template voice-assistant-frontend
  2. Install dependencies and start your frontend application:

    cd <frontend_dir>
    pnpm install
    pnpm dev

Launch your app and talk to your agent

  1. Visit your locally-running application (by default, http://localhost:3000).
  2. Select Connect and start a conversation with your agent.

Next steps