Skip to main content

LiveKit Inference

Access the best AI models for voice agents, included in LiveKit Cloud.

Overview

Overview showing LiveKit Inference serving a STT-LLM-TTS pipeline for a voice agent.

LiveKit Inference provides access to many of the best models and providers for voice agents, including models from OpenAI, Google, AssemblyAI, Deepgram, Cartesia, ElevenLabs, and more. LiveKit Inference is included in LiveKit Cloud, and does not require any additional plugins. See the guides for LLM, STT, and TTS for supported models and configuration options.

To learn more about LiveKit Inference, see the blog post Introducing LiveKit Inference: A unified model interface for voice AI.

For LiveKit Inference models, use the inference module classes in your AgentSession:

from livekit.agents import AgentSession, inference
session = AgentSession(
stt=inference.STT(
model="deepgram/flux-general",
language="en"
),
llm=inference.LLM(
model="openai/gpt-4.1-mini",
),
tts=inference.TTS(
model="cartesia/sonic-3",
voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
),
)
import { AgentSession, inference } from '@livekit/agents';
session = new AgentSession({
stt: new inference.STT({
model: "deepgram/flux-general",
language: "en"
}),
llm: new inference.LLM({
model: "openai/gpt-4.1-mini",
}),
tts: new inference.TTS({
model: "cartesia/sonic-3",
voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
}),
});

String descriptors

As a shortcut, you can pass a model descriptor string directly instead of using the inference classes. This is a convenient way to get started quickly.

from livekit.agents import AgentSession
session = AgentSession(
stt="deepgram/nova-3:en",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
)
import { AgentSession } from '@livekit/agents';
session = new AgentSession({
stt: "deepgram/nova-3:en",
llm: "openai/gpt-4.1-mini",
tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
});

For detailed parameter references and model-specific options, see the individual model guides for LLM, STT, and TTS.

Models

The following tables list all models currently available through LiveKit Inference.

Pricing

See the latest pricing for all LiveKit Inference models.

Large language models (LLM)

Model familyModel nameModel ID
GPT-4o
openai/gpt-4o
GPT-4o mini
openai/gpt-4o-mini
GPT-4.1
openai/gpt-4.1
GPT-4.1 mini
openai/gpt-4.1-mini
GPT-4.1 nano
openai/gpt-4.1-nano
GPT-5
openai/gpt-5
GPT-5 mini
openai/gpt-5-mini
GPT-5 nano
openai/gpt-5-nano
GPT-5.1
openai/gpt-5.1
GPT-5.1 Chat Latest
openai/gpt-5.1-chat-latest
GPT-5.2
openai/gpt-5.2
GPT-5.2 Chat Latest
openai/gpt-5.2-chat-latest
GPT OSS 120B
openai/gpt-oss-120b
Gemini 3 Pro
google/gemini-3-pro
Gemini 3 Flash
google/gemini-3-flash
Gemini 2.5 Pro
google/gemini-2.5-pro
Gemini 2.5 Flash
google/gemini-2.5-flash
Gemini 2.5 Flash Lite
google/gemini-2.5-flash-lite
KimiKimi
Kimi K2 Instruct
moonshotai/kimi-k2-instruct
DeepSeek V3
deepseek-ai/deepseek-v3
DeepSeek V3.2
deepseek-ai/deepseek-v3.2

Speech-to-text (STT)

ProviderModel nameModel IDLanguages
Universal-3 Pro Streaming
assemblyai/u3-rt-pro
6 languages
Universal-Streaming
assemblyai/universal-streaming
English only
Universal-Streaming-Multilingual
assemblyai/universal-streaming-multilingual
6 languages
Ink Whisper
cartesia/ink-whisper
100 languages
Flux
deepgram/flux-general
English only
Nova-3
deepgram/nova-3
Multilingual, 9 languages
Nova-3 Medical
deepgram/nova-3-medical
English only
Nova-2
deepgram/nova-2
Multilingual, 33 languages
Nova-2 Medical
deepgram/nova-2-medical
English only
Nova-2 Conversational AI
deepgram/nova-2-conversationalai
English only
Nova-2 Phonecall
deepgram/nova-2-phonecall
English only
Scribe V2 Realtime
elevenlabs/scribe_v2_realtime
41 languages

Text-to-speech (TTS)

ProviderModel IDLanguages
cartesia/sonic-3
endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
cartesia/sonic-2
enfrdeesptzhjahiitkonlplrusvtr
cartesia/sonic-turbo
enfrdeesptzhjahiitkonlplrusvtr
cartesia/sonic
enfrdeesptzhjahiitkonlplrusvtr
deepgram/aura-2
enes
elevenlabs/eleven_flash_v2
en
elevenlabs/eleven_flash_v2_5
enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
elevenlabs/eleven_turbo_v2
en
elevenlabs/eleven_turbo_v2_5
enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
elevenlabs/eleven_multilingual_v2
enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukru
inworld/inworld-tts-1.5-max
enesfrkonlzhdeitjaplptruhi
inworld/inworld-tts-1.5-mini
enesfrkonlzhdeitjaplptruhi
inworld/inworld-tts-1-max
enesfrkonlzhdeitjaplptru
inworld/inworld-tts-1
enesfrkonlzhdeitjaplptru
rime/arcana
enesfrdearhehijapt
rime/mistv2
enesfrde

Billing

LiveKit Inference billing is based on usage. Discounted rates are available on the Scale plan. Custom rates are available on the Enterprise plan. Refer to the following articles for more information on quotas, limits, and billing for LiveKit Inference. The latest pricing is always available on the LiveKit Inference pricing page.