Overview
Large language models (LLMs) are a type of AI model that can generate text output from text input. In voice AI apps, they fit between speech-to-text (STT) and text-to-speech (TTS) and are responsible for tool calls and generating the agent's text response.
Available providers
The agents framework includes plugins for the following LLM providers out-of-the-box. Choose a provider from the list below for a step-by-step guide. You can also implement the LLM node to provide custom behavior or an alternative provider. All providers support high-performance, low-latency streaming and tool calls. Support for other features is noted in the following table.
Provider | Notes | Vision | Structured Output | Custom Models | Available in |
---|---|---|---|---|---|
Wide range of models from Llama, DeepSeek, Mistral, and more. | ✓ | — | ✓ | Python | |
Claude family of models. | ✓ | — | — | Python | |
✓ | — | — | Python | ||
✓ | ✓ | — | Python, Node.js | ||
Models from Llama, DeepSeek, and more. | ✓ | — | — | Python | |
Use a LangGraph workflow for your agent LLM. | ✓ | ✓ | ✓ | Python | |
Mistral family of models (for use with La Plateforme). | ✓ | ✓ | — | Python | |
✓ | ✓ | — | Python, Node.js | ||
✓ | ✓ | — | Python, Node.js | ||
Models from Llama and DeepSeek. | ✓ | ✓ | — | Python, Node.js | |
✓ | — | — | Python, Node.js | ||
Wide range of models from Llama, DeepSeek, Mistral, and more. | ✓ | — | ✓ | Python | |
Stateful API with memory features. | ✓ | — | — | Python | |
Self-hosted models from Llama, DeepSeek, and more. | ✓ | — | ✓ | Python | |
✓ | ✓ | — | Python, Node.js | ||
Models from Llama, DeepSeek, OpenAI, and Mistral, and more. | ✓ | ✓ | — | Python, Node.js | |
Models from Llama, DeepSeek, Mistral, and more. | ✓ | ✓ | ✓ | Python, Node.js | |
Grok family of models. | ✓ | ✓ | — | Python, Node.js |
Have another provider in mind? LiveKit is open source and welcomes new plugin contributions.
Realtime models like the OpenAI Realtime API, Gemini Live, and Amazon Nova Sonic are capable of consuming and producing speech directly. LiveKit Agents supports them as an alternative to using an LLM plugin, without the need for STT and TTS. To learn more, see Realtime models.
How to use
The following sections describe high-level usage only.
For more detailed information about installing and using plugins, see the plugins overview.
Usage in AgentSession
Construct an AgentSession
or Agent
with an LLM
instance created by your desired plugin:
from livekit.agents import AgentSessionfrom livekit.plugins import openaisession = AgentSession(llm=openai.LLM(model="gpt-4o-mini"))
Standalone usage
You can also use an LLM
instance in a standalone fashion with its simple streaming interface. It expects a ChatContext
object, which contains the conversation history. The return value is a stream of ChatChunk
s. This interface is the same across all LLM providers, regardless of their underlying API design:
from livekit.agents import ChatContextfrom livekit.plugins import openaillm = openai.LLM(model="gpt-4o-mini")chat_ctx = ChatContext()chat_ctx.add_message(role="user", content="Hello, this is a test message!")async with llm.chat(chat_ctx=chat_ctx) as stream:async for chunk in stream:print("Received chunk:", chunk.delta)
Tool usage
All LLM providers support tools (sometimes called "functions"). LiveKit Agents has full support for them within an AgentSession
. For more information, see Tool definition and use.
Vision usage
All LLM providers support vision within most of their models. LiveKit agents supports vision input from URL or from realtime video frames. Consult your model provider for details on compatible image types, external URL support, and other constraints. For more information, see Vision.