Skip to main content

Kimi LLM

How to use Kimi models with LiveKit Agents.

Use in Agent Builder

Create a new agent in your browser using this model

Overview

Kimi models are available in LiveKit Agents through LiveKit Inference. Pricing is available on the pricing page.

LiveKit Inference

Use LiveKit Inference to access Kimi models without a separate Moonshot API key.

Model nameModel IDProviders
Kimi K2.5
moonshotai/kimi-k2.5
baseten
Kimi K2 Instruct
Retired
moonshotai/kimi-k2-instruct
baseten
Retired models
Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Usage

To use Kimi, use the LLM class from the inference module. You can use this LLM in the Voice AI quickstart:

from livekit.agents import AgentSession, inference
session = AgentSession(
llm=inference.LLM(
model="moonshotai/kimi-k2.5",
provider="baseten",
extra_kwargs={
"max_completion_tokens": 1000
}
),
# ... tts, stt, vad, turn_handling, etc.
)
import { AgentSession, inference } from '@livekit/agents';
const session = new AgentSession({
llm: new inference.LLM({
model: "moonshotai/kimi-k2.5",
provider: "baseten",
modelOptions: {
max_completion_tokens: 1000
}
}),
// ... tts, stt, vad, turnHandling, etc.
});

Parameters

The following are parameters for configuring Kimi models with LiveKit Inference. For model behavior parameters like temperature and max_completion_tokens, see model parameters.

model
Required
string

The model ID from the models list.

providerstring

Set a specific provider to use for the LLM. Refer to the models list for available providers. If not set, LiveKit Inference uses the best available provider, and bills accordingly.

extra_kwargsdict

Additional parameters to pass to the Kimi Chat Completions API, such as max_tokens or temperature. See model parameters for supported fields.

In Node.js this parameter is called modelOptions.

Model parameters

Pass the following parameters inside extra_kwargs (Python) or modelOptions (Node.js). For more details about each parameter in the list, see Inference parameters.

ParameterTypeDefaultNotes
temperaturefloatVaries by modelValid range: 01. Fixed at 0.6 for kimi-k2 and cannot be modified for kimi-k2.5.
top_pfloatVaries by modelValid range: 01. Fixed at 0.95 for kimi-k2.5 and cannot be modified.
max_completion_tokensint1024Maximum number of tokens to generate. Must not exceed the remaining token budget after inputs.
frequency_penaltyfloat0Reduces the model's likelihood to repeat the same line verbatim. Valid range: -2.02.0. Fixed for kimi-k2.5.
presence_penaltyfloat0Increases the model's likelihood to talk about new topics. Valid range: -2.02.0. Fixed for kimi-k2.5.
stopstr | list[str]Sequences that stop generation. Up to 5 sequences, each up to 32 bytes.
nint1Number of completions to generate. Valid range: 15. Fixed at 1 for kimi-k2.5.
prompt_cache_keystrKey for caching responses for similar requests.
safety_identifierstrHashed user identifier for policy violation monitoring.

String descriptors

As a shortcut, you can also pass a model ID directly to the llm argument in your AgentSession:

from livekit.agents import AgentSession
session = AgentSession(
llm="moonshotai/kimi-k2.5",
# ... tts, stt, vad, turn_handling, etc.
)
import { AgentSession } from '@livekit/agents';
const session = new AgentSession({
llm: "moonshotai/kimi-k2.5",
// ... tts, stt, vad, turnHandling, etc.
});

Additional resources

The following links provide more information about Kimi in LiveKit Inference.