Experience Cerebras's fast inference in a LiveKit-powered voice AI playground

Cerebras ecosystem support
Cerebras provides high-throughput, low-latency AI inference for open models like Llama and DeepSeek. LiveKit Agents has full support for Cerebras inference via the OpenAI plugin, as Cerebras is an OpenAI-compatible LLM provider.
Getting started
Use the Voice AI quickstart to build a voice AI app with Cerebras. Select an STT-LLM-TTS pipeline model type and add the following components to build on Cerebras.
Voice AI quickstart
Build your first voice AI app with Cerebras.
Install the OpenAI plugin:
pip install "livekit-agents[openai]~=1.0rc"
Add your Cerebras API key to your .env
file:
CEREBRAS_API_KEY=<your-cerebras-api-key>
Use the Cerebras LLM to initialize your AgentSession
:
from livekit.plugins import openai# ...# in your entrypoint functionsession = AgentSession(llm=openai.LLM.with_cerebras(model="llama-3.3-70b",),)
For a full list of supported models, including DeepSeek, see the Cerebras docs.
LiveKit Agents overview
LiveKit Agents is an open-source framework for building realtime AI apps using WebRTC transport to end-user devices and WebSockets or HTTPS for backend services.
- Agent workflows: Build complex voice AI apps with discrete stages and handoffs.
- Telephony: Inbound and outbound calling using SIP trunks.
- Frontend SDKs: Full-featured SDKs and UI components for JavaScript, Swift, Kotlin, Flutter, React Native, and Unity.
- Python and Node.js: Build voice AI apps in Python or Node.js.
- Dispatch and load balancing: Built-in support for request distribution and load balancing.
- LiveKit Cloud: Fully-managed LiveKit server with global scale and low latency (you can also self-host).
WebRTC provides significant advantages over other options for building realtime applications such as websockets.
- Optimized for media: Purpose-built for audio and video with advanced codecs and compression algorithms.
- Network resilient: Performs reliably even in challenging network conditions due to UDP, adaptive bitrate, and more.
- Broad compatibility: Natively supported in all modern browsers.
LiveKit handles all of the complexity of running production-grade WebRTC infrastructure while extending support to mobile apps, backends, and telephony.
Further reading
More information about integrating Llama is available in the following article:
Llama Models
LiveKit docs on Llama models and available parameters.