Large language model (LLM) integrations

Overview

Large language models (LLMs) are a type of AI model that can generate text output from text input. In voice AI apps, they fit between speech-to-text (STT) and text-to-speech (TTS) and are responsible for tool calls and generating the agent's text response.

The agents framework includes plugins for popular LLM providers out of the box. You can also implement the LLM node to provide custom behavior or an alternative provider.

LiveKit is open source and welcomes new plugin contributions.

Realtime models

Realtime models like the OpenAI Realtime API and Google Multimodal Live are capable of consuming and producing speech directly. LiveKit Agents supports them as an alternative to using an LLM plugin, without the need for STT and TTS. To learn more, see Realtime models.

How to use

The following sections describe high-level usage only.

For more detailed information about installing and using plugins, see the plugins overview.

Usage in `AgentSession`

Construct an AgentSession or Agent with an LLM instance created by your desired plugin:

from livekit.agents import AgentSession
from livekit.plugins import openai

session = AgentSession(
    llm=openai.LLM(model="gpt-4o-mini")
)

Standalone usage

You can also use an LLM instance in a standalone fashion with its simple streaming interface. It expects a ChatContext object, which contains the conversation history. The return value is a stream of ChatChunks. This interface is the same across all LLM providers, regardless of their underlying API design:

from livekit.agents import ChatContext
from livekit.plugins import openai

llm = openai.LLM(model="gpt-4o-mini")
    
chat_ctx = ChatContext()
chat_ctx.add_message(role="user", content="Hello, this is a test message!")
    
async with llm.chat(chat_ctx=chat_ctx) as stream:
    async for chunk in stream:
        print("Received chunk:", chunk.delta)

Tool usage

Most LLM providers support tools (sometimes called "functions"). LiveKit Agents has full support for them within an AgentSession. For more information, see the documentation.

Vision usage

Some LLM providers support vision within their models. LiveKit agents supports vision input from URL or from realtime video frames. Consult your model provider for details on compatible image types, external URL support, and other constraints.

from livekit.agents.llm import ImageContent

chat_ctx.add_message(role="user", content=["Describe this image", ImageContent(image="https://picsum.photos/200/300")])

Available providers

The following table lists the available LLM providers for LiveKit Agents.

OpenAI API compatibility

Many providers have standardized around the OpenAI API "chat completions" API, even for other models like Llama, DeepSeek, and more. The LiveKit Agents OpenAI plugin includes compatibility with many of these providers, listed in the following table.

Provider	Plugin	Notes
Amazon Bedrock	`aws`	Wide range of models from Llama, DeepSeek, Mistral, and more.
Anthropic	`anthropic`	Claude family of models.
Google Gemini	`google`
Groq	`groq`	Models from Llama, DeepSeek, and more.
OpenAI	`openai`
Azure OpenAI	`openai`
Cerebras	`openai`	Models from Llama and DeepSeek.
DeepSeek	`openai`
Fireworks	`openai`	Wide range of models from Llama, DeepSeek, Mistral, and more.
Perplexity	`openai`
Telnyx	`openai`	Models from Llama, DeepSeek, OpenAI, and Mistral, and more.
xAI	`openai`	Grok family of models.
Ollama	`openai`	Self-hosted models from Llama, DeepSeek, and more.
Together AI	`openai`	Models from Llama, DeepSeek, Mistral, and more.