Skip to main content

Large language models (LLM) overview

Conversational intelligence for your voice agents.

Overview

The core reasoning, response, and orchestration of your voice agent is powered by an LLM. You can choose between a variety of models to balance performance, accuracy, and cost. In a voice agent, your LLM receives a transcript of the user's speech from an STT model, and produces a text response which is turned into speech by a TTS model.

You can choose a model served through LiveKit Inference, which is included in LiveKit Cloud, or you can use a plugin to connect directly to a wider range of model providers with your own account.

LiveKit Inference

The following models are available in LiveKit Inference. Refer to the guide for each model for more details on additional configuration options.

Model familyModel nameModel ID
GPT-4o
openai/gpt-4o
GPT-4o mini
openai/gpt-4o-mini
GPT-4.1
openai/gpt-4.1
GPT-4.1 mini
openai/gpt-4.1-mini
GPT-4.1 nano
openai/gpt-4.1-nano
GPT-5
openai/gpt-5
GPT-5 mini
openai/gpt-5-mini
GPT-5 nano
openai/gpt-5-nano
GPT-5.1
openai/gpt-5.1
GPT-5.1 Chat Latest
openai/gpt-5.1-chat-latest
GPT-5.2
openai/gpt-5.2
GPT-5.2 Chat Latest
openai/gpt-5.2-chat-latest
GPT-5.4
openai/gpt-5.4
GPT-5.3 Chat Latest
openai/gpt-5.3-chat-latest
GPT OSS 120B
openai/gpt-oss-120b
Gemini 3 Pro
google/gemini-3-pro
Gemini 3 Flash
google/gemini-3-flash
Gemini 2.5 Pro
google/gemini-2.5-pro
Gemini 2.5 Flash
google/gemini-2.5-flash
Gemini 2.5 Flash Lite
google/gemini-2.5-flash-lite
KimiKimi
Kimi K2 Instruct
moonshotai/kimi-k2-instruct
DeepSeek V3
deepseek-ai/deepseek-v3
DeepSeek V3.2
deepseek-ai/deepseek-v3.2

Plugins

The LiveKit Agents framework also includes a variety of open source plugins for a wide range of LLM providers. Plugins are especially useful if you need custom or fine-tuned models. These plugins require authentication with the provider yourself, usually via an API key. You are responsible for setting up your own account and managing your own billing and credentials. The plugins are listed below, along with their availability for Python or Node.js.

ProviderPythonNode.js

Have another provider in mind? LiveKit is open source and welcomes new plugin contributions.

Usage

To set up an LLM in an AgentSession, provide the model ID to the llm argument. LiveKit Inference manages the connection to the model automatically. Consult the models list for available models.

from livekit.agents import AgentSession
session = AgentSession(
llm="openai/gpt-4.1-mini",
)
import { AgentSession } from '@livekit/agents';
session = new AgentSession({
llm: "openai/gpt-4.1-mini",
});

Additional parameters

More configuration options, such as reasoning effort, are available for each model. To set additional parameters, use the LLM class from the inference module. Consult each model reference for examples and available parameters.

Advanced features

The following sections cover more advanced topics common to all LLM providers. For more detailed reference on individual provider configuration, consult the model reference or plugin documentation for that provider.

Custom LLM

To create an entirely custom LLM, implement the LLM node in your agent.

Standalone usage

You can use an LLM instance as a standalone component with its streaming interface. It expects a ChatContext object, which contains the conversation history. The return value is a stream of ChatChunk objects. This interface is the same across all LLM providers, regardless of their underlying API design:

from livekit.agents import ChatContext
from livekit.plugins import openai
# Use Responses API (recommended for direct OpenAI usage)
llm = openai.responses.LLM(model="gpt-4o-mini")
chat_ctx = ChatContext()
chat_ctx.add_message(role="user", content="Hello, this is a test message!")
async with llm.chat(chat_ctx=chat_ctx) as stream:
async for chunk in stream:
print("Received chunk:", chunk.delta)

Collecting the response

Only Available in
Python

In Python, LLMStream provides a collect() convenience method that awaits the full response and makes it easy to use LLMs outside of the AgentSession context. You can collect a full response and execute tool calls. This can be useful for background tasks, pre-processing, or any workflow where you need LLM capabilities without the full voice agent pipeline.

The return value for collect() is a CollectedResponse object with text, tool_calls, and usage fields.

The following example shows how to collect a full response and execute any tool calls it contains:

import asyncio
from dotenv import load_dotenv
from livekit.agents import function_tool, inference, llm
load_dotenv(".env.local")
my_llm = inference.LLM(model="openai/gpt-4.1-mini")
@function_tool
async def get_weather(location: str) -> str:
"""Get the current weather for a location.
Args:
location: The city name to get weather for.
"""
print(f" [tool called] get_weather(location={location!r})")
return f"The weather in {location} is sunny and 72°F."
async def main() -> None:
tools = [get_weather]
tool_ctx = llm.ToolContext(tools)
chat_ctx = llm.ChatContext()
chat_ctx.add_message(role="system", content="You are a helpful assistant.")
chat_ctx.add_message(role="user", content="What's the weather in San Francisco?")
# First LLM call — expect a tool call
response = await my_llm.chat(chat_ctx=chat_ctx, tools=tools).collect()
print(f"text: {response.text!r}")
print(f"tool_calls: {response.tool_calls}")
# Execute each tool call and add results to context
for tc in response.tool_calls:
result = await llm.execute_function_call(tc, tool_ctx)
chat_ctx.insert(result.fnc_call)
if result.fnc_call_out:
chat_ctx.insert(result.fnc_call_out)
# Second LLM call — expect a final text response
final = await my_llm.chat(chat_ctx=chat_ctx, tools=tools).collect()
print(f"final: {final.text!r}")
if __name__ == "__main__":
asyncio.run(main())

Vision

LiveKit Agents supports image input from URL or from realtime video frames. Consult your model provider for details on compatible image types, external URL support, and other constraints. For more information, see Vision.

Additional resources

The following resources cover related topics that may be useful for your application.