Skip to main content

LiveKit Inference

Access the best AI models for voice agents, included in LiveKit Cloud.

Overview

Overview showing LiveKit Inference serving a STT-LLM-TTS pipeline for a voice agent.

LiveKit Inference provides access to many of the best models and providers for voice agents, including models from OpenAI, Google, AssemblyAI, Deepgram, Cartesia, ElevenLabs, and more. LiveKit Inference is included in LiveKit Cloud, and does not require any additional plugins. It's zero data retention by default, so your prompts, audio, and model outputs are never stored or used to train models. See the guides for LLM, STT, and TTS for supported models and configuration options.

To learn more about LiveKit Inference, see the blog post Introducing LiveKit Inference: A unified model interface for voice AI .

For LiveKit Inference models, use the inference module classes in your AgentSession:

from livekit.agents import AgentSession, inference
session = AgentSession(
stt=inference.STT(
model="deepgram/flux-general",
language="en"
),
llm=inference.LLM(
model="google/gemma-4-31b-it",
),
tts=inference.TTS(
model="cartesia/sonic-3",
voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
),
)
import { AgentSession, inference } from '@livekit/agents';
session = new AgentSession({
stt: new inference.STT({
model: "deepgram/flux-general",
language: "en"
}),
llm: new inference.LLM({
model: "google/gemma-4-31b-it",
}),
tts: new inference.TTS({
model: "cartesia/sonic-3",
voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
}),
});

String descriptors

As a shortcut, you can pass a model descriptor string directly instead of using the inference classes. This is a convenient way to get started quickly.

from livekit.agents import AgentSession
session = AgentSession(
stt="deepgram/nova-3:en",
llm="google/gemma-4-31b-it",
tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
)
import { AgentSession } from '@livekit/agents';
session = new AgentSession({
stt: "deepgram/nova-3:en",
llm: "google/gemma-4-31b-it",
tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
});

For detailed parameter references and model-specific options, see the individual model guides for LLM, STT, and TTS.

Zero data retention

LiveKit Inference is zero data retention (ZDR) by default. Your prompts, audio, and model outputs pass through only to generate a response. Neither LiveKit nor the underlying model providers log, store, or train on your data. This applies to every LLM, STT, and TTS model, on every plan, with no configuration required.

Note

Zero data retention applies to the data you send to model providers through LiveKit Inference. It's separate from agent observability, which you can enable to retain session data in LiveKit Cloud, and from custom voices, which store the voice sample you provide so you can reuse the clone across providers.

Models

The following tables list all models currently available through LiveKit Inference.

Pricing

See the latest pricing for all LiveKit Inference models.

Large language models (LLM)

Recommended for voice agents

Gemma 4 31B is the recommended default LLM. It's a latency-optimized, open-weight model served on LiveKit's infrastructure.

Model familyModel nameModel ID
Gemma 4 31B
google/gemma-4-31b-it
DeepSeek-V4 Pro
deepseek-ai/deepseek-v4-pro
DeepSeek-V3
Retired
deepseek-ai/deepseek-v3
DeepSeek-V3.1
Retired
deepseek-ai/deepseek-v3.1
DeepSeek-V3.2
Retired
deepseek-ai/deepseek-v3.2
Gemini 2.5 Flash
google/gemini-2.5-flash
Gemini 2.5 Flash-Lite
google/gemini-2.5-flash-lite
Gemini 2.5 Pro
google/gemini-2.5-pro
Gemini 3 Flash
google/gemini-3-flash-preview
Gemini 3.1 Flash Lite
google/gemini-3.1-flash-lite
Gemini 3.1 Pro
google/gemini-3.1-pro-preview
Gemini 3.5 Flash
google/gemini-3.5-flash
Gemini 2.0 Flash
Retired
google/gemini-2.0-flash
Gemini 2.0 Flash-Lite
Retired
google/gemini-2.0-flash-lite
Gemini 3 Pro
Retired
google/gemini-3-pro-preview
Kimi K2.5
moonshotai/kimi-k2.5
Kimi K2 Instruct
Retired
moonshotai/kimi-k2-instruct
ChatGPT Latest
openai/chat-latest
GPT-4.1
openai/gpt-4.1
GPT-4.1 mini
openai/gpt-4.1-mini
GPT-4.1 nano
openai/gpt-4.1-nano
GPT-4o
openai/gpt-4o
GPT-4o mini
openai/gpt-4o-mini
GPT-5
openai/gpt-5
GPT-5 mini
openai/gpt-5-mini
GPT-5 nano
openai/gpt-5-nano
GPT-5.1
openai/gpt-5.1
GPT-5.2
openai/gpt-5.2
GPT-5.4
openai/gpt-5.4
GPT-5.4 mini
openai/gpt-5.4-mini
GPT-5.4 nano
openai/gpt-5.4-nano
GPT-5.5
openai/gpt-5.5
GPT OSS 120B
openai/gpt-oss-120b
GPT-5.1 Chat
Deprecated
openai/gpt-5.1-chat-latest
GPT-5.2 Chat
Deprecated
openai/gpt-5.2-chat-latest
GPT-5.3 Chat
Deprecated
openai/gpt-5.3-chat-latest
Grok 4.1 Fast
xai/grok-4-1-fast-non-reasoning
Grok 4.1 Fast Reasoning
xai/grok-4-1-fast-reasoning
Grok 4.20
xai/grok-4.20-0309-non-reasoning
Grok 4.20 Reasoning
xai/grok-4.20-0309-reasoning
Grok 4.20 Multi-Agent
xai/grok-4.20-multi-agent-0309
Retired models
Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Speech-to-text (STT)

Model familyModel nameModel IDLanguages
Flux
deepgram/flux-general-en
en
Flux (Multilingual)
deepgram/flux-general-multi
multienesfrdehiruptjaitnl
Nova-3
deepgram/nova-3
arar-AEar-SAar-QAar-KWar-SYar-LBar-PSar-JOar-EGar-SDar-TDar-MAar-DZar-TNar-IQar-IRbebnbsbgcahrcsdada-DKnlnl-BEenen-USen-AUen-GBen-INen-NZetfifrfr-CAdede-CHelhihuiditjaknkoko-KRlvltmkmsmrnoplptpt-BRpt-PTrorusrsksleses-419svsv-SEtltatetrukvizhzh-CNzh-Hanszh-TWzh-Hantzh-HKmulti
Nova-3 Medical
deepgram/nova-3-medical
enen-USen-AUen-CAen-GBen-IEen-INen-NZ
Nova-2
deepgram/nova-2
multibgcazhzh-CNzh-Hanszh-TWzh-Hantzh-HKcsdada-DKnlnl-BEenen-USen-AUen-GBen-NZen-INetfifrfr-CAdede-CHelhihuiditjakoko-KRlvltmsnoplptpt-BRpt-PTroruskeses-419svsv-SEthth-THtrukvi
Nova-2 Conversational AI
deepgram/nova-2-conversationalai
enen-US
Nova-2 Medical
deepgram/nova-2-medical
enen-US
Nova-2 Phone Call
deepgram/nova-2-phonecall
enen-US
Universal-3 Pro Streaming
assemblyai/u3-rt-pro
enen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PT
Universal-3.5 Pro Streaming
assemblyai/universal-3-5-pro
enen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PTtrnlsvnodafihiviarhejaurzh
Universal-Streaming
assemblyai/universal-streaming
enen-US
Universal-Streaming-Multilingual
assemblyai/universal-streaming-multilingual
multienen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PT
Ink 2
cartesia/ink-2
en
Ink Whisper
cartesia/ink-whisper
enzhdeesrukofrjapttrplcanlarsvitidhifiviheukelmscsrodahutanothurhrbgltlamimlcysktefalvbnsrazslknetmkbreuishynemnbskksqswglmrpasikmsnyosoafockabetgsdguamyilouzfohtpstknnmtsalbmybotlmgastthawlnhabajwsuyue
Scribe v2 Realtime
elevenlabs/scribe_v2_realtime
afaframamhararaasasmazazjbebelbgbulbnbenbsbosmymyacacatcscesnynyacycymdadandedeuelellenengesspaetestfafasfffulfifinfrfragagleglglglglugkakatgugujhahauhehebhihinhrhrvhuhunhyhyeidindigiboisislititajajpnjvjavkkkazkmkhmknkankokorkukurkykirlbltzlnlinlolaoltlitlvlavmimrimkmkdmlmalmnmonmrmarmsmsamtmltnenepnlnldnonorocociororipapanplpolpspusptporroronrurussrsrpsdsndskslkslslvsnsnasosomsvsweswswatatamteteltgtgkththatrturukukrururduzuzbviviewowolxhxhozhzhozuzulastcebfilkealuonsoumbyue
Speechmatics Enhanced
speechmatics/enhanced
arar_enbabebgbncacmncmn_encmn_en_ms_tacscydadeelenen_msen_taeoeseteufafifrgaglhehihrhuiaiditjakoltlvmnmrmsmtnlnoplptroruskslsvswtathtltrugukurviyue
Speechmatics Standard
speechmatics/standard
arar_enbabebgbncacmncmn_encmn_en_ms_tacscydadeelenen_msen_taeoeseteufafifrgaglhehihrhuiaiditjakoltlvmnmrmsmtnlnoplptroruskslsvswtathtltrugukurviyue
Speech to Text
xai/stt-1
enarcsdanlfrdehiiditjakomsfaplptroruessvthtrvifilmk

Text-to-speech (TTS)

Model familyModel nameModel IDLanguages
Sonic 2
cartesia/sonic-2
enfrdeesptzhjako
Sonic 3
cartesia/sonic-3
endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
Sonic 3 (2025-10-27)
cartesia/sonic-3-2025-10-27
endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
Sonic 3 (2026-01-12)
cartesia/sonic-3-2026-01-12
endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
Sonic 3 Latest
cartesia/sonic-3-latest
endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
Sonic 3.5
cartesia/sonic-3.5
endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
Sonic 3.5 (2026-05-04)
cartesia/sonic-3.5-2026-05-04
endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
Sonic Latest
cartesia/sonic-latest
endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
Sonic Turbo
cartesia/sonic-turbo
enfrdeesptzhjahiko
Sonic
Retired
cartesia/sonic
enfrdeesptzhjahiitkonlplrusvtr
Aura-2
deepgram/aura-2
enen-USen-PHen-GBen-AUeses-COes-MXes-ESes-419es-ARnlnl-NLfrfr-FRdede-DEitit-ITjaja-JP
Aura-1
Retired
deepgram/aura
enen-USen-IEen-GB
Eleven Flash v2
elevenlabs/eleven_flash_v2
en
Eleven Flash v2.5
elevenlabs/eleven_flash_v2_5
enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
Eleven Multilingual v2
elevenlabs/eleven_multilingual_v2
enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukru
Eleven Turbo v2
elevenlabs/eleven_turbo_v2
en
Eleven Turbo v2.5
elevenlabs/eleven_turbo_v2_5
enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
Eleven v3
elevenlabs/eleven_v3
enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
Realtime TTS 1.5 Max
inworld/inworld-tts-1.5-max
enzhjakoruitesptfrdeplnlhihear
Realtime TTS 1.5 Mini
inworld/inworld-tts-1.5-mini
enzhjakoruitesptfrdeplnlhihear
Realtime TTS 2.0
inworld/inworld-tts-2
enzhjakoruitesptfrdeplnlhihear
Realtime TTS
Retired
inworld/inworld-tts-1
enesfrkonlzhdeitjaplptruhihear
Realtime TTS Max
Retired
inworld/inworld-tts-1-max
enesfrkonlzhdeitjaplptruhihear
Arcana
rime/arcana
enesfrdehihejaptar
Coda
rime/coda
enesfrdeptja
Mist
rime/mist
en
Mist v2
rime/mistv2
enesfrde
Mist v3
rime/mistv3
enesfrdehi
Text to Speech
xai/tts-1
autoenar-EGar-SAar-AEbnzhfrdehiiditjakopt-BRpt-PTrues-MXes-EStrvi
Retired models
Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Billing

LiveKit Inference billing is based on usage. Discounted rates are available on the Scale plan. Custom rates are available on the Enterprise plan. Refer to the following articles for more information on quotas, limits, and billing for LiveKit Inference. The latest pricing is always available on the LiveKit Inference pricing page .