Large language model (LLM) integrations

Guides for adding LLM integrations to your agents.

Overview

Large language models (LLMs) are a type of AI model that can generate text output from text input. In voice AI apps, they fit between speech-to-text (STT) and text-to-speech (TTS) and are responsible for tool calls and generating the agent's text response.

The agents framework includes plugins for popular LLM providers out of the box. You can also implement the LLM node to provide custom behavior or an alternative provider.

LiveKit is open source and welcomes new plugin contributions.

Realtime models

Realtime models like the OpenAI Realtime API and Google Multimodal Live are capable of consuming and producing speech directly. LiveKit Agents supports them as an alternative to using an LLM plugin, without the need for STT and TTS. To learn more, see Realtime models.

How to use

The following sections describe high-level usage only.

For more detailed information about installing and using plugins, see the plugins overview.

Usage in AgentSession

Construct an AgentSession or Agent with an LLM instance created by your desired plugin:

from livekit.agents import AgentSession
from livekit.plugins import openai
session = AgentSession(
llm=openai.LLM(model="gpt-4o-mini")
)

Standalone usage

You can also use an LLM instance in a standalone fashion with its simple streaming interface. It expects a ChatContext object, which contains the conversation history. The return value is a stream of ChatChunks. This interface is the same across all LLM providers, regardless of their underlying API design:

from livekit.agents import ChatContext
from livekit.plugins import openai
llm = openai.LLM(model="gpt-4o-mini")
chat_ctx = ChatContext()
chat_ctx.add_message(role="user", content="Hello, this is a test message!")
async with llm.chat(chat_ctx=chat_ctx) as stream:
async for chunk in stream:
print("Received chunk:", chunk.delta)

Tool usage

Most LLM providers support tools (sometimes called "functions"). LiveKit Agents has full support for them within an AgentSession. For more information, see the documentation.

Vision usage

Some LLM providers support vision within their models. LiveKit agents supports vision input from URL or from realtime video frames. Consult your model provider for details on compatible image types, external URL support, and other constraints.

from livekit.agents.llm import ImageContent
chat_ctx.add_message(role="user", content=["Describe this image", ImageContent(image="https://picsum.photos/200/300")])

Available providers

The following table lists the available LLM providers for LiveKit Agents.

OpenAI API compatibility

Many providers have standardized around the OpenAI API "chat completions" API, even for other models like Llama, DeepSeek, and more. The LiveKit Agents OpenAI plugin includes compatibility with many of these providers, listed in the following table.

ProviderPluginNotes
Amazon Bedrock iconAmazon BedrockawsWide range of models from Llama, DeepSeek, Mistral, and more.
Anthropic iconAnthropicanthropicClaude family of models.
Google Gemini iconGoogle Geminigoogle
Groq iconGroqgroqModels from Llama, DeepSeek, and more.
OpenAI iconOpenAIopenai
Azure OpenAI iconAzure OpenAIopenai
Cerebras iconCerebrasopenaiModels from Llama and DeepSeek.
DeepSeek iconDeepSeekopenai
Fireworks iconFireworksopenaiWide range of models from Llama, DeepSeek, Mistral, and more.
Perplexity iconPerplexityopenai
Telnyx iconTelnyxopenaiModels from Llama, DeepSeek, OpenAI, and Mistral, and more.
xAI iconxAIopenaiGrok family of models.
OllamaopenaiSelf-hosted models from Llama, DeepSeek, and more.
Together AIopenaiModels from Llama, DeepSeek, Mistral, and more.

Further reading