Cerebras and LiveKit

Build voice AI on the world's fastest inference.

Try Cerebras AI

Experience Cerebras's fast inference in a LiveKit-powered voice AI playground

Try Cerebras AI

Cerebras ecosystem support

Cerebras provides high-throughput, low-latency AI inference for open models like Llama and DeepSeek. LiveKit Agents has full support for Cerebras inference via the OpenAI plugin, as Cerebras is an OpenAI-compatible LLM provider.

Getting started

Use the Voice AI quickstart to build a voice AI app with Cerebras. Select an STT-LLM-TTS pipeline model type and add the following components to build on Cerebras.

Voice AI quickstart

Build your first voice AI app with Cerebras.

Install the OpenAI plugin:

pip install "livekit-agents[openai]~=1.0rc"

Add your Cerebras API key to your .env file:

CEREBRAS_API_KEY=<your-cerebras-api-key>

Use the Cerebras LLM to initialize your AgentSession:

from livekit.plugins import openai
# ...
# in your entrypoint function
session = AgentSession(
llm=openai.LLM.with_cerebras(
model="llama-3.3-70b",
),
)

For a full list of supported models, including DeepSeek, see the Cerebras docs.

LiveKit Agents overview

LiveKit Agents is an open-source framework for building realtime AI apps using WebRTC transport to end-user devices and WebSockets or HTTPS for backend services.

  • Agent workflows: Build complex voice AI apps with discrete stages and handoffs.
  • Telephony: Inbound and outbound calling using SIP trunks.
  • Frontend SDKs: Full-featured SDKs and UI components for JavaScript, Swift, Kotlin, Flutter, React Native, and Unity.
  • Python and Node.js: Build voice AI apps in Python or Node.js.
  • Dispatch and load balancing: Built-in support for request distribution and load balancing.
  • LiveKit Cloud: Fully-managed LiveKit server with global scale and low latency (you can also self-host).
What is WebRTC?

WebRTC provides significant advantages over other options for building realtime applications such as websockets.

  • Optimized for media: Purpose-built for audio and video with advanced codecs and compression algorithms.
  • Network resilient: Performs reliably even in challenging network conditions due to UDP, adaptive bitrate, and more.
  • Broad compatibility: Natively supported in all modern browsers.

LiveKit handles all of the complexity of running production-grade WebRTC infrastructure while extending support to mobile apps, backends, and telephony.

Further reading

More information about integrating Llama is available in the following article:

Llama Models

LiveKit docs on Llama models and available parameters.

Was this page helpful?