LiveKit docs › Agents framework › Inference LLM parameters

---

# LiveKit Inference LLM parameters

> Full reference for model parameters supported by LiveKit Inference LLMs.

## Overview

[LiveKit Inference](https://docs.livekit.io/agents/models/llm.md) LLMs let you customize the behavior of the language model when generating responses by passing additional parameters. You can specify these model parameters when creating an instance of the `LLM` class in the `inference` module using `extra_kwargs` in Python or `modelOptions` in Node.js.

## Model parameters

The following is a complete list of supported Chat Completion options. Not every model supports every parameter; unsupported parameters are silently ignored. For model-specific details, see the documentation for the [model](https://docs.livekit.io/agents/models/llm.md#inference) you're using.

> ℹ️ **Reasoning model compatibility**
> 
> Parameters not supported by reasoning models are automatically stripped at request time.

- **`temperature`** _(float)_ (optional) - Default: `1`: Sampling temperature that controls the randomness of the model's output. Higher values make the output more random, while lower values make it more focused and deterministic. Range of valid values can vary by model.

You can set this or `top_p`, but not both. Not supported by reasoning models.

- **`top_p`** _(float)_ (optional) - Default: `1`: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered.

You can set this or `temperature`, but not both. Not supported by reasoning models.

- **`max_tokens`** _(int)_ (optional): The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.

Not supported by newer models; use `max_completion_tokens` instead.

- **`max_completion_tokens`** _(int)_ (optional): An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Preferred over `max_tokens` for newer models.

- **`reasoning_effort`** _("low" | "medium" | "high")_ (optional): Controls how much reasoning effort the model spends. Only supported by reasoning models.

- **`frequency_penalty`** _(float)_ (optional) - Default: `0`: Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Not supported by reasoning models.

- **`presence_penalty`** _(float)_ (optional) - Default: `0`: Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Not supported by reasoning models.

- **`seed`** _(int)_ (optional): If specified, the system makes a best effort to sample deterministically. Repeated requests with the same `seed` and parameters should return the same result.

- **`stop`** _(str | list[str])_ (optional): List of sequences that cause the API to stop generating further tokens. For example, `stop=["\n"]` stops generation when the model outputs a newline character.

- **`n`** _(int)_ (optional): Number of completions to generate for each prompt. Not supported by reasoning models.

- **`logprobs`** _(bool)_ (optional): If `true`, returns the log probabilities of each output token returned in the `content` of `message`. Not supported by reasoning models.

- **`top_logprobs`** _(int)_ (optional): An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. Valid range varies by provider.

Requires `logprobs: true`. Not supported by reasoning models.

- **`logit_bias`** _(dict[str, int])_ (optional): Modify the likelihood of specified tokens appearing in the completion. Not supported by reasoning models.

- **`parallel_tool_calls`** _(bool)_ (optional): Whether the model can make multiple tool calls in a single response.

- **`tool_choice`** _(ToolChoice | Literal['auto', 'required', 'none'])_ (optional) - Default: `"auto"`: Controls how the model uses tools. String options are as follows:

- `'auto'`: Let the model decide.
- `'required'`: Force tool usage.
- `'none'`: Disable tool usage.

- **`user`** _(str)_ (optional) - **DEPRECATED**: Unique identifier for the end user, used for abuse monitoring. Deprecated: See [safety_identifier](#safety_identifier) and [prompt_cache_key](#prompt_cache_key) instead.

- **`service_tier`** _("auto" | "default" | "flex" | "scale" | "priority")_ (optional): Specifies the latency tier for processing the request.

- **`metadata`** _(Metadata)_ (optional): Developer-defined tags and values for filtering completions in the dashboard.

- **`store`** _(bool)_ (optional): Whether to store the output for model distillation or evals.

- **`prediction`** _(ChatCompletionPredictionContentParam)_ (optional): Configuration for predicted output to reduce latency for known response patterns.

- **`modalities`** _(list[Literal["text", "audio"]])_ (optional): Output types the model can generate.

- **`web_search_options`** _(WebSearchOptions)_ (optional): Configuration for web search for relevant results to use in a response.

- **`verbosity`** _("low" | "medium" | "high")_ (optional): Constrains the verbosity of the model's response. Lower values result in more concise responses, while higher values result in more verbose responses.

- **`prompt_cache_key`** _(str)_ (optional): Key for caching responses for similar requests. See [prompt caching](https://developers.openai.com/docs/guides/prompt-caching).

- **`safety_identifier`** _(str)_ (optional): String that uniquely identifies each user. Hash the username or email address to avoid sending any identifying information. For non-logged in users, you can send a session ID instead. Supercedes `user` parameter.

## Usage

The following example sets the `temperature` and `max_completion_tokens` parameters when creating an `LLM` instance:

**Python**:

```python
from livekit.agents import AgentSession, inference

session = AgentSession(
    llm=inference.LLM(
        model="openai/gpt-5.3-chat-latest",
        extra_kwargs={
            "temperature": 0.7,
            "max_completion_tokens": 1000,
        }
    ),
    # ... tts, stt, vad, turnHandling, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession, inference } from '@livekit/agents';

const session = new AgentSession({
    llm: new inference.LLM({
        model: "openai/gpt-5.3-chat-latest",
        provider: "openai",
        modelOptions: {
            temperature: 0.7,
            max_completion_tokens: 1000
        }
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

```

## Updating options at runtime

Swap models or change persistent options on a running `inference.LLM` for patterns like dynamic model routing, A/B testing, or cost-tier switching. Call `update_options()` in Python or `updateOptions()` in Node.js. Changes take effect on the next call to `chat()`.

**Python**:

```python
session.llm.update_options(
    model="openai/gpt-5.3-chat-latest",
    extra_kwargs={"temperature": 0.7},
)

```

---

**Node.js**:

```typescript
session.llm.updateOptions({
  model: "openai/gpt-5.3-chat-latest",
  modelOptions: { temperature: 0.7 },
});

```

Both arguments are optional. `model` accepts any inference model string, in the same format as the constructor. The persistent options argument (`extra_kwargs` in Python, `modelOptions` in Node.js) accepts any of the [model parameters](#model-parameters) documented in the preceding section.

> 🔥 **Options are replaced, not merged**
> 
> The persistent options argument replaces the existing options object on the `LLM` instance. Any keys you don't include are dropped. To clear all previously set options, pass `{}`.

---

This document was rendered at 2026-06-07T11:35:00.476Z.
For the latest version of this document, see [https://docs.livekit.io/reference/agents/inference-llm-parameters.md](https://docs.livekit.io/reference/agents/inference-llm-parameters.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).