The MultimodalAgent
class in the LiveKit Agents Framework uses the OpenAI Realtime API for speech-to-speech interactions between AI voice assistants and end users. It is implemented in both our Python and Node Agents Framework libraries.
If you're not using the OpenAI Realtime API, see the Voice agent with STT, LLM, TTS quickstart.
Prerequisites
Steps
The following steps take you through the process of creating a LiveKit account and using the LiveKit CLI to create an agent from some minimal templates. At the end of the quickstart, you'll have an agent and a frontend you can use to talk to your agent.
Setup a LiveKit account and install the CLI
Create an account or sign in to your LiveKit Cloud account.
Install the LiveKit CLI and authenticate using
lk cloud auth
.
Bootstrap an agent from template
Clone a starter template for your preferred language using the CLI:
lk app create --template multimodal-agent-pythonEnter your OpenAI API Key when prompted.
Install dependencies and start your agent:
cd <agent_dir>python3 -m venv venvsource venv/bin/activatepython3 -m pip install -r requirements.txtpython3 agent.py devYou can edit the
agent.py
file to customize the system prompt and other aspects of your agent.
Bootstrap a frontend from template
Clone the Voice Assistant Frontend Next.js app starter template using the CLI:
lk app create --template voice-assistant-frontendInstall dependencies and start your frontend application:
cd <frontend_dir>pnpm installpnpm dev
Launch your app and talk to your agent
- Visit your locally-running application (by default, http://localhost:3000).
- Select Connect and start a conversation with your agent.
Next steps
- Learn more in the OpenAI Realtime API integration guide.
- Let your friends and colleagues talk to your agent by connecting it to a LiveKit Sandbox.
- Create an agent that accepts incoming calls using SIP.
- Create an agent that makes outbound calls using SIP.