LiveKit docs › Models › LLM › Additional models › Cerebras

---

# Cerebras LLM plugin guide

> How to use the Cerebras inference with LiveKit Agents.

Available in:
- [x] Node.js
- [x] Python

## Overview

This plugin allows you to use [Cerebras](https://www.cerebras.net/) as an LLM provider for your voice agents. Both the Python and Node.js plugins include built-in [payload optimization](https://inference-docs.cerebras.ai/capabilities/payload-optimization) via gzip compression and msgpack encoding for reduced TTFT on large prompts.

> 💡 **LiveKit Inference**
> 
> Some Cerebras models are also available in LiveKit Inference, with billing and integration handled automatically. See [the docs](https://docs.livekit.io/agents/models/llm.md) for more information.

## Usage

Install the plugin:

**Python**:

```shell
uv add "livekit-agents[cerebras]~=1.5"

```

---

**Node.js**:

```shell
pnpm add @livekit/agents-plugin-cerebras@1.x

```

Set the following environment variable in your `.env` file:

```shell
CEREBRAS_API_KEY=<your-cerebras-api-key>

```

Create a Cerebras LLM:

**Python**:

```python
from livekit.plugins import cerebras

session = AgentSession(
    llm=cerebras.LLM(
        model="llama3.1-8b",
    ),
    # ... tts, stt, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { LLM } from '@livekit/agents-plugin-cerebras';

const session = new voice.AgentSession({
    llm: new LLM({
        model: "llama3.1-8b",
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

```

## Parameters

This section describes some of the available parameters. See the plugin reference links in the [Additional resources](#additional-resources) section for a complete list of all available parameters.

- **`model`** _(str | CerebrasChatModels)_ (optional) - Default: `llama3.1-8b`: Model to use for inference. To learn more, see [supported models](https://inference-docs.cerebras.ai/api-reference/chat-completions#param-model).

- **`gzip_compression`** _(bool)_ (optional) - Default: `true`: When enabled, request payloads are gzip-compressed before sending, which can reduce TTFT for requests with large prompts. To learn more, see [payload optimization](https://inference-docs.cerebras.ai/capabilities/payload-optimization).

- **`msgpack_encoding`** _(bool)_ (optional) - Default: `true`: When enabled, request payloads are encoded with [msgpack](https://msgpack.org/) binary format instead of JSON for additional payload size reduction.

- **`temperature`** _(float)_ (optional) - Default: `1.0`: Sampling temperature that controls the randomness of the model's output. Higher values make the output more random, while lower values make it more focused and deterministic. Range of valid values can vary by model.

Valid values are between `0` and `1.5`. To learn more, see the [Cerebras documentation](https://inference-docs.cerebras.ai/api-reference/chat-completions#param-temperature).

- **`parallel_tool_calls`** _(bool)_ (optional): Controls whether the model can make multiple tool calls in parallel. When enabled, the model can make multiple tool calls simultaneously, which can improve performance for complex tasks.

- **`tool_choice`** _(ToolChoice | Literal['auto', 'required', 'none'])_ (optional) - Default: `auto`: Controls how the model uses tools. String options are as follows:

- `'auto'`: Let the model decide.
- `'required'`: Force tool usage.
- `'none'`: Disable tool usage.

## Additional resources

The following links provide more information about the Cerebras LLM integration.

- **[Cerebras docs](https://inference-docs.cerebras.ai/)**: Cerebras inference docs.

- **[Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md)**: Get started with LiveKit Agents and Cerebras.

---

This document was rendered at 2026-06-07T11:35:50.201Z.
For the latest version of this document, see [https://docs.livekit.io/agents/models/llm/cerebras.md](https://docs.livekit.io/agents/models/llm/cerebras.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).