Kimi LLM | LiveKit Documentation

Create a new agent in your browser using this model

Overview

Kimi models are available in LiveKit Agents through LiveKit Inference. Pricing is available on the pricing page .

LiveKit Inference

Use LiveKit Inference to access Kimi models without a separate Moonshot API key.

Model name	Model ID	Providers
Kimi K2.5	moonshotai/kimi-k2.5	baseten
Kimi K2 Instruct Retired	moonshotai/kimi-k2-instruct	baseten

Retired models

Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Usage

To use Kimi, use the LLM class from the inference module. You can use this LLM in the Voice AI quickstart:

from livekit.agents import AgentSession, inference

session = AgentSession(
    llm=inference.LLM(
        model="moonshotai/kimi-k2.5", 
        provider="baseten",
        extra_kwargs={
            "max_completion_tokens": 1000
        }
    ),
    # ... tts, stt, vad, turn_handling, etc.
)

import { AgentSession, inference } from '@livekit/agents';

const session = new AgentSession({
    llm: new inference.LLM({ 
        model: "moonshotai/kimi-k2.5", 
        provider: "baseten",
        modelOptions: { 
            max_completion_tokens: 1000 
        }
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

Parameters

The following are parameters for configuring Kimi models with LiveKit Inference. For model behavior parameters like temperature and max_completion_tokens, see model parameters.

model

Required

string

The model ID from the models list.

providerstring

Set a specific provider to use for the LLM. Refer to the models list for available providers. If not set, LiveKit Inference uses the best available provider, and bills accordingly.

extra_kwargsdict

Additional parameters to pass to the Kimi Chat Completions API, such as max_tokens or temperature. See model parameters for supported fields.

In Node.js this parameter is called modelOptions.

Model parameters

Pass the following parameters inside extra_kwargs (Python) or modelOptions (Node.js). For more details about each parameter in the list, see Inference parameters.

Parameter	Type	Default	Notes
`temperature`	`float`	Varies by model	Valid range: `0`–`1`. Fixed at `0.6` for `kimi-k2` and cannot be modified for `kimi-k2.5`.
`top_p`	`float`	Varies by model	Valid range: `0`–`1`. Fixed at `0.95` for `kimi-k2.5` and cannot be modified.
`max_completion_tokens`	`int`	`1024`	Maximum number of tokens to generate. Must not exceed the remaining token budget after inputs.
`frequency_penalty`	`float`	`0`	Reduces the model's likelihood to repeat the same line verbatim. Valid range: `-2.0`–`2.0`. Fixed for `kimi-k2.5`.
`presence_penalty`	`float`	`0`	Increases the model's likelihood to talk about new topics. Valid range: `-2.0`–`2.0`. Fixed for `kimi-k2.5`.
`stop`	`str \| list[str]`		Sequences that stop generation. Up to 5 sequences, each up to 32 bytes.
`n`	`int`	`1`	Number of completions to generate. Valid range: `1`–`5`. Fixed at `1` for `kimi-k2.5`.
`prompt_cache_key`	`str`		Key for caching responses for similar requests.
`safety_identifier`	`str`		Hashed user identifier for policy violation monitoring.

String descriptors

As a shortcut, you can also pass a model ID directly to the llm argument in your AgentSession:

from livekit.agents import AgentSession

session = AgentSession(
    llm="moonshotai/kimi-k2.5",
    # ... tts, stt, vad, turn_handling, etc.
)

import { AgentSession } from '@livekit/agents';

const session = new AgentSession({
    llm: "moonshotai/kimi-k2.5",
    // ... tts, stt, vad, turnHandling, etc.
});

Additional resources

The following links provide more information about Kimi in LiveKit Inference.

Baseten Plugin

Plugin to use your own Baseten account instead of LiveKit Inference.

Baseten docs

Baseten's official Model API documentation.