Cerebras and LiveKit | LiveKit Docs

Experience Cerebras's fast inference in a LiveKit-powered voice AI playground

Cerebras ecosystem support

Cerebras provides high-throughput, low-latency AI inference for open models like Llama and DeepSeek. LiveKit Agents has full support for Cerebras inference via the OpenAI plugin, as Cerebras is an OpenAI-compatible LLM provider.

Getting started

Use the Voice AI quickstart to build a voice AI app with Cerebras. Select an STT-LLM-TTS pipeline model type and add the following components to build on Cerebras.

Voice AI quickstart

Build your first voice AI app with Cerebras.

Install the OpenAI plugin:

pip install "livekit-agents[openai]~=1.0rc"

Add your Cerebras API key to your .env file:

CEREBRAS_API_KEY=<your-cerebras-api-key>

Use the Cerebras LLM to initialize your AgentSession:

from livekit.plugins import openai

# ...

# in your entrypoint function
session = AgentSession(
    llm=openai.LLM.with_cerebras(
        model="llama-3.3-70b",
    ),
)

For a full list of supported models, including DeepSeek, see the Cerebras docs.

LiveKit Agents overview

LiveKit Agents is an open-source framework for building realtime AI apps using WebRTC transport to end-user devices and WebSockets or HTTPS for backend services.

Agent workflows: Build complex voice AI apps with discrete stages and handoffs.
Telephony: Inbound and outbound calling using SIP trunks.
Frontend SDKs: Full-featured SDKs and UI components for JavaScript, Swift, Kotlin, Flutter, React Native, and Unity.
Python and Node.js: Build voice AI apps in Python or Node.js.
Dispatch and load balancing: Built-in support for request distribution and load balancing.
LiveKit Cloud: Fully-managed LiveKit server with global scale and low latency (you can also self-host).

What is WebRTC?

WebRTC provides significant advantages over other options for building realtime applications such as websockets.

Optimized for media: Purpose-built for audio and video with advanced codecs and compression algorithms.
Network resilient: Performs reliably even in challenging network conditions due to UDP, adaptive bitrate, and more.
Broad compatibility: Natively supported in all modern browsers.

LiveKit handles all of the complexity of running production-grade WebRTC infrastructure while extending support to mobile apps, backends, and telephony.

Llama Models

LiveKit docs on Llama models and available parameters.

Cerebras ecosystem support

Getting started

Voice AI quickstart

LiveKit Agents overview

Further reading

Llama Models