Overview
Large language models (LLMs) are a type of AI model that can generate text output from text input. In voice AI apps, they fit between speech-to-text (STT) and text-to-speech (TTS) and are responsible for tool calls and generating the agent's text response.
The agents framework includes plugins for popular LLM providers out of the box. You can also implement the LLM node to provide custom behavior or an alternative provider.
LiveKit is open source and welcomes new plugin contributions.
Realtime models like the OpenAI Realtime API and Google Multimodal Live are capable of consuming and producing speech directly. LiveKit Agents supports them as an alternative to using an LLM plugin, without the need for STT and TTS. To learn more, see Realtime models.
How to use
The following sections describe high-level usage only.
For more detailed information about installing and using plugins, see the plugins overview.
Usage in AgentSession
Construct an AgentSession
or Agent
with an LLM
instance created by your desired plugin:
from livekit.agents import AgentSessionfrom livekit.plugins import openaisession = AgentSession(llm=openai.LLM(model="gpt-4o-mini"))
Standalone usage
You can also use an LLM
instance in a standalone fashion with its simple streaming interface. It expects a ChatContext
object, which contains the conversation history. The return value is a stream of ChatChunk
s. This interface is the same across all LLM providers, regardless of their underlying API design:
from livekit.agents import ChatContextfrom livekit.plugins import openaillm = openai.LLM(model="gpt-4o-mini")chat_ctx = ChatContext()chat_ctx.add_message(role="user", content="Hello, this is a test message!")async with llm.chat(chat_ctx=chat_ctx) as stream:async for chunk in stream:print("Received chunk:", chunk.delta)
Tool usage
Most LLM providers support tools (sometimes called "functions"). LiveKit Agents has full support for them within an AgentSession
. For more information, see the documentation.
Vision usage
Some LLM providers support vision within their models. LiveKit agents supports vision input from URL or from realtime video frames. Consult your model provider for details on compatible image types, external URL support, and other constraints.
from livekit.agents.llm import ImageContentchat_ctx.add_message(role="user", content=["Describe this image", ImageContent(image="https://picsum.photos/200/300")])
Available providers
The following table lists the available LLM providers for LiveKit Agents.
Many providers have standardized around the OpenAI API "chat completions" API, even for other models like Llama, DeepSeek, and more. The LiveKit Agents OpenAI plugin includes compatibility with many of these providers, listed in the following table.
Provider | Plugin | Notes | |
---|---|---|---|
Amazon Bedrock | aws | Wide range of models from Llama, DeepSeek, Mistral, and more. | |
Anthropic | anthropic | Claude family of models. | |
Google Gemini | google | ||
Groq | groq | Models from Llama, DeepSeek, and more. | |
OpenAI | openai | ||
Azure OpenAI | openai | ||
Cerebras | openai | Models from Llama and DeepSeek. | |
DeepSeek | openai | ||
Fireworks | openai | Wide range of models from Llama, DeepSeek, Mistral, and more. | |
Perplexity | openai | ||
Telnyx | openai | Models from Llama, DeepSeek, OpenAI, and Mistral, and more. | |
xAI | openai | Grok family of models. | |
Ollama | openai | Self-hosted models from Llama, DeepSeek, and more. | |
Together AI | openai | Models from Llama, DeepSeek, Mistral, and more. |