Overview
LiveKit Inference provides access to many of the best models and providers for voice agents, including models from OpenAI, Google, AssemblyAI, Deepgram, Cartesia, ElevenLabs, and more. LiveKit Inference is included in LiveKit Cloud, and does not require any additional plugins. It's zero data retention by default, so your prompts, audio, and model outputs are never stored or used to train models. See the guides for LLM, STT, and TTS for supported models and configuration options.
To learn more about LiveKit Inference, see the blog post Introducing LiveKit Inference: A unified model interface for voice AI .
For LiveKit Inference models, use the inference module classes in your AgentSession:
from livekit.agents import AgentSession, inferencesession = AgentSession(stt=inference.STT(model="deepgram/flux-general",language="en"),llm=inference.LLM(model="google/gemma-4-31b-it",),tts=inference.TTS(model="cartesia/sonic-3",voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",),)
import { AgentSession, inference } from '@livekit/agents';session = new AgentSession({stt: new inference.STT({model: "deepgram/flux-general",language: "en"}),llm: new inference.LLM({model: "google/gemma-4-31b-it",}),tts: new inference.TTS({model: "cartesia/sonic-3",voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",}),});
String descriptors
As a shortcut, you can pass a model descriptor string directly instead of using the inference classes. This is a convenient way to get started quickly.
from livekit.agents import AgentSessionsession = AgentSession(stt="deepgram/nova-3:en",llm="google/gemma-4-31b-it",tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",)
import { AgentSession } from '@livekit/agents';session = new AgentSession({stt: "deepgram/nova-3:en",llm: "google/gemma-4-31b-it",tts: "cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",});
For detailed parameter references and model-specific options, see the individual model guides for LLM, STT, and TTS.
Zero data retention
LiveKit Inference is zero data retention (ZDR) by default. Your prompts, audio, and model outputs pass through only to generate a response. Neither LiveKit nor the underlying model providers log, store, or train on your data. This applies to every LLM, STT, and TTS model, on every plan, with no configuration required.
Zero data retention applies to the data you send to model providers through LiveKit Inference. It's separate from agent observability, which you can enable to retain session data in LiveKit Cloud, and from custom voices, which store the voice sample you provide so you can reuse the clone across providers.
Models
The following tables list all models currently available through LiveKit Inference.
Pricing
See the latest pricing for all LiveKit Inference models.
Large language models (LLM)
Gemma 4 31B is the recommended default LLM. It's a latency-optimized, open-weight model served on LiveKit's infrastructure.
| Model family | Model name | Model ID |
|---|---|---|
Gemma 4 31B | google/gemma-4-31b-it | |
DeepSeek-V4 Pro | deepseek-ai/deepseek-v4-pro | |
DeepSeek-V3 Retired | deepseek-ai/deepseek-v3 | |
DeepSeek-V3.1 Retired | deepseek-ai/deepseek-v3.1 | |
DeepSeek-V3.2 Retired | deepseek-ai/deepseek-v3.2 | |
Gemini 2.5 Flash | google/gemini-2.5-flash | |
Gemini 2.5 Flash-Lite | google/gemini-2.5-flash-lite | |
Gemini 2.5 Pro | google/gemini-2.5-pro | |
Gemini 3 Flash | google/gemini-3-flash-preview | |
Gemini 3.1 Flash Lite | google/gemini-3.1-flash-lite | |
Gemini 3.1 Pro | google/gemini-3.1-pro-preview | |
Gemini 3.5 Flash | google/gemini-3.5-flash | |
Gemini 2.0 Flash Retired | google/gemini-2.0-flash | |
Gemini 2.0 Flash-Lite Retired | google/gemini-2.0-flash-lite | |
Gemini 3 Pro Retired | google/gemini-3-pro-preview | |
Kimi K2.5 | moonshotai/kimi-k2.5 | |
Kimi K2 Instruct Retired | moonshotai/kimi-k2-instruct | |
ChatGPT Latest | openai/chat-latest | |
GPT-4.1 | openai/gpt-4.1 | |
GPT-4.1 mini | openai/gpt-4.1-mini | |
GPT-4.1 nano | openai/gpt-4.1-nano | |
GPT-4o | openai/gpt-4o | |
GPT-4o mini | openai/gpt-4o-mini | |
GPT-5 | openai/gpt-5 | |
GPT-5 mini | openai/gpt-5-mini | |
GPT-5 nano | openai/gpt-5-nano | |
GPT-5.1 | openai/gpt-5.1 | |
GPT-5.2 | openai/gpt-5.2 | |
GPT-5.4 | openai/gpt-5.4 | |
GPT-5.4 mini | openai/gpt-5.4-mini | |
GPT-5.4 nano | openai/gpt-5.4-nano | |
GPT-5.5 | openai/gpt-5.5 | |
GPT OSS 120B | openai/gpt-oss-120b | |
GPT-5.1 Chat Deprecated | openai/gpt-5.1-chat-latest | |
GPT-5.2 Chat Deprecated | openai/gpt-5.2-chat-latest | |
GPT-5.3 Chat Deprecated | openai/gpt-5.3-chat-latest | |
Grok 4.1 Fast | xai/grok-4-1-fast-non-reasoning | |
Grok 4.1 Fast Reasoning | xai/grok-4-1-fast-reasoning | |
Grok 4.20 | xai/grok-4.20-0309-non-reasoning | |
Grok 4.20 Reasoning | xai/grok-4.20-0309-reasoning | |
Grok 4.20 Multi-Agent | xai/grok-4.20-multi-agent-0309 |
Speech-to-text (STT)
| Model family | Model name | Model ID | Languages |
|---|---|---|---|
Flux | deepgram/flux-general-en | en | |
Flux (Multilingual) | deepgram/flux-general-multi | multienesfrdehiruptjaitnl | |
Nova-3 | deepgram/nova-3 | arar-AEar-SAar-QAar-KWar-SYar-LBar-PSar-JOar-EGar-SDar-TDar-MAar-DZar-TNar-IQar-IRbebnbsbgcahrcsdada-DKnlnl-BEenen-USen-AUen-GBen-INen-NZetfifrfr-CAdede-CHelhihuiditjaknkoko-KRlvltmkmsmrnoplptpt-BRpt-PTrorusrsksleses-419svsv-SEtltatetrukvizhzh-CNzh-Hanszh-TWzh-Hantzh-HKmulti | |
Nova-3 Medical | deepgram/nova-3-medical | enen-USen-AUen-CAen-GBen-IEen-INen-NZ | |
Nova-2 | deepgram/nova-2 | multibgcazhzh-CNzh-Hanszh-TWzh-Hantzh-HKcsdada-DKnlnl-BEenen-USen-AUen-GBen-NZen-INetfifrfr-CAdede-CHelhihuiditjakoko-KRlvltmsnoplptpt-BRpt-PTroruskeses-419svsv-SEthth-THtrukvi | |
Nova-2 Conversational AI | deepgram/nova-2-conversationalai | enen-US | |
Nova-2 Medical | deepgram/nova-2-medical | enen-US | |
Nova-2 Phone Call | deepgram/nova-2-phonecall | enen-US | |
Universal-3 Pro Streaming | assemblyai/u3-rt-pro | enen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PT | |
Universal-3.5 Pro Streaming | assemblyai/universal-3-5-pro | enen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PTtrnlsvnodafihiviarhejaurzh | |
Universal-Streaming | assemblyai/universal-streaming | enen-US | |
Universal-Streaming-Multilingual | assemblyai/universal-streaming-multilingual | multienen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PT | |
Ink 2 | cartesia/ink-2 | en | |
Ink Whisper | cartesia/ink-whisper | enzhdeesrukofrjapttrplcanlarsvitidhifiviheukelmscsrodahutanothurhrbgltlamimlcysktefalvbnsrazslknetmkbreuishynemnbskksqswglmrpasikmsnyosoafockabetgsdguamyilouzfohtpstknnmtsalbmybotlmgastthawlnhabajwsuyue | |
Scribe v2 Realtime | elevenlabs/scribe_v2_realtime | afaframamhararaasasmazazjbebelbgbulbnbenbsbosmymyacacatcscesnynyacycymdadandedeuelellenengesspaetestfafasfffulfifinfrfragagleglglglglugkakatgugujhahauhehebhihinhrhrvhuhunhyhyeidindigiboisislititajajpnjvjavkkkazkmkhmknkankokorkukurkykirlbltzlnlinlolaoltlitlvlavmimrimkmkdmlmalmnmonmrmarmsmsamtmltnenepnlnldnonorocociororipapanplpolpspusptporroronrurussrsrpsdsndskslkslslvsnsnasosomsvsweswswatatamteteltgtgkththatrturukukrururduzuzbviviewowolxhxhozhzhozuzulastcebfilkealuonsoumbyue | |
Speechmatics Enhanced | speechmatics/enhanced | arar_enbabebgbncacmncmn_encmn_en_ms_tacscydadeelenen_msen_taeoeseteufafifrgaglhehihrhuiaiditjakoltlvmnmrmsmtnlnoplptroruskslsvswtathtltrugukurviyue | |
Speechmatics Standard | speechmatics/standard | arar_enbabebgbncacmncmn_encmn_en_ms_tacscydadeelenen_msen_taeoeseteufafifrgaglhehihrhuiaiditjakoltlvmnmrmsmtnlnoplptroruskslsvswtathtltrugukurviyue | |
Speech to Text | xai/stt-1 | enarcsdanlfrdehiiditjakomsfaplptroruessvthtrvifilmk |
Text-to-speech (TTS)
| Model family | Model name | Model ID | Languages |
|---|---|---|---|
Sonic 2 | cartesia/sonic-2 | enfrdeesptzhjako | |
Sonic 3 | cartesia/sonic-3 | endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa | |
Sonic 3 (2025-10-27) | cartesia/sonic-3-2025-10-27 | endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa | |
Sonic 3 (2026-01-12) | cartesia/sonic-3-2026-01-12 | endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa | |
Sonic 3 Latest | cartesia/sonic-3-latest | endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa | |
Sonic 3.5 | cartesia/sonic-3.5 | endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa | |
Sonic 3.5 (2026-05-04) | cartesia/sonic-3.5-2026-05-04 | endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa | |
Sonic Latest | cartesia/sonic-latest | endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa | |
Sonic Turbo | cartesia/sonic-turbo | enfrdeesptzhjahiko | |
Sonic Retired | cartesia/sonic | enfrdeesptzhjahiitkonlplrusvtr | |
Aura-2 | deepgram/aura-2 | enen-USen-PHen-GBen-AUeses-COes-MXes-ESes-419es-ARnlnl-NLfrfr-FRdede-DEitit-ITjaja-JP | |
Aura-1 Retired | deepgram/aura | enen-USen-IEen-GB | |
Eleven Flash v2 | elevenlabs/eleven_flash_v2 | en | |
Eleven Flash v2.5 | elevenlabs/eleven_flash_v2_5 | enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi | |
Eleven Multilingual v2 | elevenlabs/eleven_multilingual_v2 | enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukru | |
Eleven Turbo v2 | elevenlabs/eleven_turbo_v2 | en | |
Eleven Turbo v2.5 | elevenlabs/eleven_turbo_v2_5 | enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi | |
Eleven v3 | elevenlabs/eleven_v3 | enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi | |
Realtime TTS 1.5 Max | inworld/inworld-tts-1.5-max | enzhjakoruitesptfrdeplnlhihear | |
Realtime TTS 1.5 Mini | inworld/inworld-tts-1.5-mini | enzhjakoruitesptfrdeplnlhihear | |
Realtime TTS 2.0 | inworld/inworld-tts-2 | enzhjakoruitesptfrdeplnlhihear | |
Realtime TTS Retired | inworld/inworld-tts-1 | enesfrkonlzhdeitjaplptruhihear | |
Realtime TTS Max Retired | inworld/inworld-tts-1-max | enesfrkonlzhdeitjaplptruhihear | |
Arcana | rime/arcana | enesfrdehihejaptar | |
Coda | rime/coda | enesfrdeptja | |
Mist | rime/mist | en | |
Mist v2 | rime/mistv2 | enesfrde | |
Mist v3 | rime/mistv3 | enesfrdehi | |
Text to Speech | xai/tts-1 | autoenar-EGar-SAar-AEbnzhfrdehiiditjakopt-BRpt-PTrues-MXes-EStrvi |
Billing
LiveKit Inference billing is based on usage. Discounted rates are available on the Scale plan. Custom rates are available on the Enterprise plan. Refer to the following articles for more information on quotas, limits, and billing for LiveKit Inference. The latest pricing is always available on the LiveKit Inference pricing page .