LiveKit docs › Models › LLM › Overview

---

# Large language models (LLM) overview

> Conversational intelligence for your voice agents.

## Overview

The core reasoning, response, and orchestration of your voice agent is powered by an LLM. You can choose between a variety of models to balance performance, accuracy, and cost. In a voice agent, your LLM receives a transcript of the user's speech from an [STT](https://docs.livekit.io/agents/models/stt.md) model, and produces a text response which is turned into speech by a [TTS](https://docs.livekit.io/agents/models/tts.md) model.

You can choose a model served through LiveKit Inference, included with LiveKit Cloud. With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you prefer to manage billing and rate limits yourself, or need access to a provider not currently available through LiveKit Inference.

### LiveKit Inference

The following models are available in [LiveKit Inference](https://docs.livekit.io/agents/models.md#inference). Refer to the guide for each model for more details on additional configuration options.

| Model family | Model name | Provided by |
| ------------- | ---------- | ----------- |
| DeepSeek | DeepSeek-V3 | Baseten |
|   | DeepSeek-V3.1 | Baseten |
|   | DeepSeek-V3.2 | DeepSeek |
| Gemini | Gemini 2.0 Flash | Google |
|   | Gemini 2.0 Flash-Lite | Google |
|   | Gemini 2.5 Flash | Google |
|   | Gemini 2.5 Flash-Lite | Google |
|   | Gemini 2.5 Pro | Google |
|   | Gemini 3 Flash | Google |
|   | Gemini 3 Pro | Google |
|   | Gemini 3.1 Flash Lite | Google |
|   | Gemini 3.5 Flash | Google |
|   | Gemini 3.1 Pro | Google |
| Kimi | Kimi K2 Instruct | Baseten |
|   | Kimi K2.5 | Baseten |
| OpenAI | GPT-4.1 | Azure, OpenAI |
|   | GPT-4.1 mini | Azure, OpenAI |
|   | GPT-4.1 nano | Azure, OpenAI |
|   | GPT-4o | Azure, OpenAI |
|   | GPT-4o mini | Azure, OpenAI |
|   | GPT-5 | Azure, OpenAI |
|   | GPT-5 mini | Azure, OpenAI |
|   | GPT-5 nano | Azure, OpenAI |
|   | GPT-5.1 | Azure, OpenAI |
|   | GPT-5.1 Chat | Azure, OpenAI |
|   | GPT-5.2 | Azure, OpenAI |
|   | GPT-5.2 Chat | Azure, OpenAI |
|   | GPT-5.3 Chat | Azure, OpenAI |
|   | GPT-5.4 | Azure, OpenAI |
|   | GPT-5.4 mini | OpenAI |
|   | GPT-5.4 nano | OpenAI |
|   | GPT-5.5 | Azure, OpenAI |
|   | ChatGPT Latest | OpenAI |
|   | GPT OSS 120B | Baseten, Cerebras, Groq |
| Azure | ChatGPT Latest | Azure |
| xAI | Grok 4.1 Fast | xAI |
|   | Grok 4.1 Fast Reasoning | xAI |
|   | Grok 4.20 | xAI |
|   | Grok 4.20 Reasoning | xAI |
|   | Grok 4.20 Multi-Agent | xAI |

### Plugins

The LiveKit Agents framework also includes a variety of open source [plugins](https://docs.livekit.io/agents/models.md#plugins) for a wide range of LLM providers. Plugins are especially useful if you need custom or fine-tuned models. These plugins require authentication with the provider yourself, usually via an API key. You are responsible for setting up your own account and managing your own billing and credentials. The plugins are listed below, along with their availability for Python or Node.js.

| Provider | Python | Node.js |
| -------- | ------ | ------- |
| [Amazon Bedrock](https://docs.livekit.io/agents/models/llm/plugins/aws.md) | ✓ | — |
| [Anthropic](https://docs.livekit.io/agents/models/llm/plugins/anthropic.md) | ✓ | — |
| [Baseten](https://docs.livekit.io/agents/models/llm/plugins/baseten.md) | ✓ | — |
| [Google Gemini](https://docs.livekit.io/agents/models/llm/plugins/gemini.md) | ✓ | ✓ |
| [Groq](https://docs.livekit.io/agents/models/llm/plugins/groq.md) | ✓ | ✓ |
| [LangChain](https://docs.livekit.io/agents/models/llm/plugins/langchain.md) | ✓ | — |
| [Mistral AI](https://docs.livekit.io/agents/models/llm/plugins/mistralai.md) | ✓ | ✓ |
| [Sarvam](https://docs.livekit.io/agents/models/llm/plugins/sarvam.md) | ✓ | — |
| [OpenAI](https://docs.livekit.io/agents/models/llm/plugins/openai.md) | ✓ | ✓ |
| [Azure OpenAI](https://docs.livekit.io/agents/models/llm/plugins/azure-openai.md) | ✓ | ✓ |
| [Cerebras](https://docs.livekit.io/agents/models/llm/plugins/cerebras.md) | ✓ | ✓ |
| [DeepSeek](https://docs.livekit.io/agents/models/llm/plugins/deepseek.md) | ✓ | ✓ |
| [Fireworks](https://docs.livekit.io/agents/models/llm/plugins/fireworks.md) | ✓ | ✓ |
| [Letta](https://docs.livekit.io/agents/models/llm/plugins/letta.md) | ✓ | — |
| [Ollama](https://docs.livekit.io/agents/models/llm/plugins/ollama.md) | ✓ | ✓ |
| [OpenRouter](https://docs.livekit.io/agents/models/llm/plugins/openrouter.md) | ✓ | — |
| [OVHCloud](https://docs.livekit.io/agents/models/llm/plugins/ovhcloud.md) | ✓ | ✓ |
| [Perplexity](https://docs.livekit.io/agents/models/llm/plugins/perplexity.md) | ✓ | ✓ |
| [Telnyx](https://docs.livekit.io/agents/models/llm/plugins/telnyx.md) | ✓ | ✓ |
| [Together AI](https://docs.livekit.io/agents/models/llm/plugins/together.md) | ✓ | ✓ |
| [xAI](https://docs.livekit.io/agents/models/llm/plugins/xai.md) | ✓ | ✓ |

Have another provider in mind? LiveKit is open source and welcomes [new plugin contributions](https://docs.livekit.io/agents/models.md#contribute).

## Usage

To set up an LLM in an `AgentSession`, provide the model ID to the `llm` argument. LiveKit Inference manages the connection to the model automatically. Consult the [models list](#inference) for available models.

**Python**:

```python
from livekit.agents import AgentSession, inference

session = AgentSession(
    llm=inference.LLM(model="openai/gpt-5.3-chat-latest"),
)

```

---

**Node.js**:

```typescript
import { AgentSession, inference } from '@livekit/agents';

const session = new AgentSession({
    llm: new inference.LLM({ model: "openai/gpt-5.3-chat-latest" }),
});

```

### Model parameters

The `LLM` class from the `inference` module lets you configure additional model parameters via `extra_kwargs` (Python) or `modelOptions` (Node.js). For the full parameter reference, see [Inference LLM parameters](https://docs.livekit.io/reference/agents/inference-llm-parameters.md) or your [model's](#inference) documentation. These options can also be [updated at runtime](https://docs.livekit.io/reference/agents/inference-llm-parameters.md#updating-options-at-runtime) without rebuilding the LLM.

## Advanced features

The following sections cover more advanced topics common to all LLM providers. For more detailed reference on individual provider configuration, consult the model reference or plugin documentation for that provider.

### Custom LLM

To create an entirely custom LLM, implement the [LLM node](https://docs.livekit.io/agents/build/nodes.md#llm_node) in your agent.

### Standalone usage

You can use an `LLM` instance as a standalone component with its streaming interface. It expects a [`ChatContext`](https://docs.livekit.io/agents/logic/chat-context.md) object, which contains the conversation history. The return value is a stream of `ChatChunk` objects. This interface is the same across all LLM providers, regardless of their underlying API design:

```python
from livekit.agents import ChatContext
from livekit.plugins import openai

# Use Responses API (recommended for direct OpenAI usage)
llm = openai.responses.LLM(model="gpt-4o-mini")

chat_ctx = ChatContext()
chat_ctx.add_message(role="user", content="Hello, this is a test message!")

async with llm.chat(chat_ctx=chat_ctx) as stream:
    async for chunk in stream:
        print("Received chunk:", chunk.delta)

```

#### Collecting the response

Available in:
- [ ] Node.js
- [x] Python

In Python, `LLMStream` provides a `collect()` convenience method that awaits the full response and returns a `CollectedResponse` with `text`, `tool_calls`, and `usage` fields. It's useful for background tasks, pre-processing, or any workflow that needs LLM output outside of the voice pipeline, for example, [summarizing conversation history before a handoff](https://docs.livekit.io/agents/logic/agents-handoffs.md#summarizing-context).

The following example shows how to collect a full response and execute any tool calls it contains:

```python
import asyncio

from dotenv import load_dotenv
from livekit.agents import function_tool, inference, llm 

load_dotenv(".env.local")

my_llm = inference.LLM(model="openai/gpt-5.3-chat-latest")


@function_tool
async def get_weather(location: str) -> str:
    """Get the current weather for a location.

    Args:
        location: The city name to get weather for.
    """ 
    print(f"  [tool called] get_weather(location={location!r})")
    return f"The weather in {location} is sunny and 72°F."


async def main() -> None:
    tools = [get_weather]
    tool_ctx = llm.ToolContext(tools)

    chat_ctx = llm.ChatContext()
    chat_ctx.add_message(role="system", content="You are a helpful assistant.")
    chat_ctx.add_message(role="user", content="What's the weather in San Francisco?")

    # First LLM call — expect a tool call
    response = await my_llm.chat(chat_ctx=chat_ctx, tools=tools).collect()
    print(f"text: {response.text!r}")
    print(f"tool_calls: {response.tool_calls}")

    # Execute each tool call and add results to context
    for tc in response.tool_calls:
        result = await llm.execute_function_call(tc, tool_ctx)
        chat_ctx.insert(result.fnc_call)
        if result.fnc_call_out:
            chat_ctx.insert(result.fnc_call_out)

    # Second LLM call — expect a final text response
    final = await my_llm.chat(chat_ctx=chat_ctx, tools=tools).collect()
    print(f"final: {final.text!r}")


if __name__ == "__main__":
    asyncio.run(main())

```

### Vision

LiveKit Agents supports image input from URL or from [realtime video frames](https://docs.livekit.io/transport/media.md). Consult your model provider for details on compatible image types, external URL support, and other constraints. For more information, see [Images](https://docs.livekit.io/agents/multimodality/vision/images.md).

## Additional resources

The following resources cover related topics that may be useful for your application.

- **[Workflows](https://docs.livekit.io/agents/logic/workflows.md)**: How to model repeatable, accurate tasks with multiple agents.

- **[Tool definition and usage](https://docs.livekit.io/agents/build/tools.md)**: Let your agents call external tools and more.

- **[Inference pricing](https://livekit.com/pricing/inference)**: The latest pricing information for all models in LiveKit Inference.

- **[Realtime models](https://docs.livekit.io/agents/models/realtime.md)**: Realtime models like the OpenAI Realtime API, Gemini Live, and Amazon Nova Sonic.

---

This document was rendered at 2026-06-07T11:33:36.844Z.
For the latest version of this document, see [https://docs.livekit.io/agents/models/llm.md](https://docs.livekit.io/agents/models/llm.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).