The MultimodalAgent class in the LiveKit Agents framework uses the OpenAI Realtime API for speech-to-speech interactions between AI voice assistants and end users.
If you're not using the OpenAI Realtime API, see the Voice agent with STT, LLM, TTS quickstart.
Prerequisites
Steps
The following steps take you through the process of creating a LiveKit account and using the LiveKit Sandbox to create an agent using the MultimodalAgent class. At the end of the quickstart, you'll have an agent and a frontend client you can use to talk to your agent.
Create a LiveKit account
Create an account or sign in to your LiveKit Cloud account.
Create a LiveKit Sandbox
A sandbox allows you to quickly create and deploy an agent locally, and test the agent using a frontend web client. To create a sandbox, follow these steps:
Sign in to LiveKit Sandbox.
Select Create app for the Voice assistant template.
Follow the Finish setting up your sandbox app instructions provided after you create your sandbox. The instructions include installing the LiveKit CLI and sandbox app.
After you run the
lk app create
command, enter the following at the appropriate prompts:- For the Select Template prompt, select
multimodal-agent-python
ormultimodal-agent-node
. - Enter your OpenAI API key at the prompt.
- For the Select Template prompt, select
Install dependencies and start your agent:
cd <your_sandbox_id>python3 -m venv venvsource venv/bin/activatepython3 -m pip install -r requirements.txtpython3 agent.py dev
Launch your sandbox and talk to your agent.
- Sign in to LiveKit Sandbox
- In the Your Sandbox apps section, select Launch for <your_sandbox_id> sandbox.
- Select Connect and start a conversation with your agent.
Next steps
Learn more in the OpenAI Realtime API integration guide.
Customize your frontend client by using the frontend sandbox as a starting point. To clone the frontend sandbox locally, run the following command:
lk app create --template voice-assistant-frontendCreate an agent that accepts incoming calls using SIP.
Create an agent that makes outbound calls using SIP.