LiveKit Inference | LiveKit Documentation

Overview

LiveKit Inference provides access to many of the best models and providers for voice agents, including models from OpenAI, Google, AssemblyAI, Deepgram, Cartesia, ElevenLabs, and more. LiveKit Inference is included in LiveKit Cloud, and does not require any additional plugins. It's zero data retention by default, so your prompts, audio, and model outputs are never stored or used to train models. See the guides for LLM, STT, and TTS for supported models and configuration options.

To learn more about LiveKit Inference, see the blog post Introducing LiveKit Inference: A unified model interface for voice AI .

For LiveKit Inference models, use the inference module classes in your AgentSession:

from livekit.agents import AgentSession, inference

session = AgentSession(
    stt=inference.STT(
        model="deepgram/flux-general",
        language="en"
    ),
    llm=inference.LLM(
        model="google/gemma-4-31b-it",
    ),
    tts=inference.TTS(
        model="inworld/inworld-tts-2",
        voice="Ashley",
    ),
)

import { AgentSession, inference } from '@livekit/agents';

session = new AgentSession({
    stt: new inference.STT({
        model: "deepgram/flux-general",
        language: "en"
    }),
    llm: new inference.LLM({
        model: "google/gemma-4-31b-it",
    }),
    tts: new inference.TTS({
        model: "inworld/inworld-tts-2",
        voice: "Ashley",
    }),
});

String descriptors

As a shortcut, you can pass a model descriptor string directly instead of using the inference classes. This is a convenient way to get started quickly.

from livekit.agents import AgentSession

session = AgentSession(
    stt="deepgram/nova-3:en",
    llm="google/gemma-4-31b-it",
    tts="inworld/inworld-tts-2:Ashley",
)

import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    stt: "deepgram/nova-3:en",
    llm: "google/gemma-4-31b-it",
    tts: "inworld/inworld-tts-2:Ashley",
});

For detailed parameter references and model-specific options, see the individual model guides for LLM, STT, and TTS.

Zero data retention

LiveKit Inference is zero data retention (ZDR) by default. Your prompts, audio, and model outputs pass through only to generate a response. Neither LiveKit nor the underlying model providers log, store, or train on your data. This applies to every LLM, STT, and TTS model, on every plan, with no configuration required.

Note

Zero data retention applies to the data you send to model providers through LiveKit Inference. It's separate from agent observability, which you can enable to retain session data in LiveKit Cloud, and from custom voices, which store the voice sample you provide so you can reuse the clone across providers.

Models

The following tables list all models currently available through LiveKit Inference.

Pricing

See the latest pricing for all LiveKit Inference models.

Large language models (LLM)

Recommended for voice agents

Gemma 4 31B is the recommended default LLM. It's a latency-optimized, open-weight model served on LiveKit's infrastructure.

Model family	Model name	Model ID
Hosted by LiveKit	Gemma 4 31B	google/gemma-4-31b-it
DeepSeek	DeepSeek-V4 Pro	deepseek-ai/deepseek-v4-pro
	DeepSeek-V3 Retired	deepseek-ai/deepseek-v3
	DeepSeek-V3.1 Retired	deepseek-ai/deepseek-v3.1
	DeepSeek-V3.2 Retired	deepseek-ai/deepseek-v3.2
Gemini	Gemini 2.5 Flash	google/gemini-2.5-flash
	Gemini 2.5 Flash-Lite	google/gemini-2.5-flash-lite
	Gemini 2.5 Pro	google/gemini-2.5-pro
	Gemini 3 Flash	google/gemini-3-flash-preview
	Gemini 3.1 Flash Lite	google/gemini-3.1-flash-lite
	Gemini 3.1 Pro	google/gemini-3.1-pro-preview
	Gemini 3.5 Flash	google/gemini-3.5-flash
	Gemini 2.0 Flash Retired	google/gemini-2.0-flash
	Gemini 2.0 Flash-Lite Retired	google/gemini-2.0-flash-lite
	Gemini 3 Pro Retired	google/gemini-3-pro-preview
Kimi	Kimi K2.5	moonshotai/kimi-k2.5
	Kimi K2.6	moonshotai/kimi-k2.6
	Kimi K2 Instruct Retired	moonshotai/kimi-k2-instruct
OpenAI	ChatGPT Latest	openai/chat-latest
	GPT-4.1	openai/gpt-4.1
	GPT-4.1 mini	openai/gpt-4.1-mini
	GPT-4.1 nano	openai/gpt-4.1-nano
	GPT-4o	openai/gpt-4o
	GPT-4o mini	openai/gpt-4o-mini
	GPT-5	openai/gpt-5
	GPT-5 mini	openai/gpt-5-mini
	GPT-5 nano	openai/gpt-5-nano
	GPT-5.1	openai/gpt-5.1
	GPT-5.2	openai/gpt-5.2
	GPT-5.4	openai/gpt-5.4
	GPT-5.4 mini	openai/gpt-5.4-mini
	GPT-5.4 nano	openai/gpt-5.4-nano
	GPT-5.5	openai/gpt-5.5
	GPT-5.6 Luna	openai/gpt-5.6-luna
	GPT-5.6 Sol	openai/gpt-5.6-sol
	GPT-5.6 Terra	openai/gpt-5.6-terra
	GPT OSS 120B	openai/gpt-oss-120b
	GPT-5.1 Chat Deprecated	openai/gpt-5.1-chat-latest
	GPT-5.2 Chat Deprecated	openai/gpt-5.2-chat-latest
	GPT-5.3 Chat Deprecated	openai/gpt-5.3-chat-latest
xAI	Grok 4.1 Fast	xai/grok-4-1-fast-non-reasoning
	Grok 4.1 Fast Reasoning	xai/grok-4-1-fast-reasoning
	Grok 4.20	xai/grok-4.20-0309-non-reasoning
	Grok 4.20 Reasoning	xai/grok-4.20-0309-reasoning
	Grok 4.20 Multi-Agent	xai/grok-4.20-multi-agent-0309
	Grok 4.3	xai/grok-4.3
	Grok 4.5	xai/grok-4.5

Retired models

Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Speech-to-text (STT)

Model family	Model name	Model ID	Languages
Deepgram	Flux	deepgram/flux-general-en	en
	Flux (Multilingual)	deepgram/flux-general-multi	multienesfrdehiruptjaitnl
	Nova-3	deepgram/nova-3	arar-AEar-SAar-QAar-KWar-SYar-LBar-PSar-JOar-EGar-SDar-TDar-MAar-DZar-TNar-IQar-IRbebnbsbgcahrcsdada-DKnlnl-BEenen-USen-AUen-GBen-INen-NZetfifrfr-CAdede-CHelhihuiditjaknkoko-KRlvltmkmsmrnoplptpt-BRpt-PTrorusrsksleses-419svsv-SEtltatetrukvizhzh-CNzh-Hanszh-TWzh-Hantzh-HKmulti
	Nova-3 Medical	deepgram/nova-3-medical	enen-USen-AUen-CAen-GBen-IEen-INen-NZ
	Nova-2	deepgram/nova-2	multibgcazhzh-CNzh-Hanszh-TWzh-Hantzh-HKcsdada-DKnlnl-BEenen-USen-AUen-GBen-NZen-INetfifrfr-CAdede-CHelhihuiditjakoko-KRlvltmsnoplptpt-BRpt-PTroruskeses-419svsv-SEthth-THtrukvi
	Nova-2 Conversational AI	deepgram/nova-2-conversationalai	enen-US
	Nova-2 Medical	deepgram/nova-2-medical	enen-US
	Nova-2 Phone Call	deepgram/nova-2-phonecall	enen-US
AssemblyAI	Universal-3.5 Pro	assemblyai/universal-3-5-pro	enen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PTtrnlsvnodafihiviarhejaurzh
	Universal-Streaming	assemblyai/universal-streaming	enen-US
	Universal-Streaming-Multilingual	assemblyai/universal-streaming-multilingual	multienen-USen-GBen-AUen-CAen-INen-NZeses-ESes-MXes-ARes-COes-CLes-PEes-VEes-ECes-GTes-CUes-BOes-DOes-HNes-PYes-SVes-NIes-CRes-PAes-UYes-PRfrfr-FRfr-CAfr-BEfr-CHdede-DEde-ATde-CHitit-ITit-CHptpt-BRpt-PT
Cartesia	Ink 2	cartesia/ink-2	en
Cartesia	Ink Whisper	cartesia/ink-whisper	enzhdeesrukofrjapttrplcanlarsvitidhifiviheukelmscsrodahutanothurhrbgltlamimlcysktefalvbnsrazslknetmkbreuishynemnbskksqswglmrpasikmsnyosoafockabetgsdguamyilouzfohtpstknnmtsalbmybotlmgastthawlnhabajwsuyue
Speechmatics	Speechmatics Enhanced	speechmatics/enhanced	arar_enbabebgbncacmncmn_encmn_en_ms_tacscydadeelenen_msen_taeoeseteufafifrgaglhehihrhuiaiditjakoltlvmnmrmsmtnlnoplptroruskslsvswtathtltrugukurviyue
Speechmatics	Speechmatics Standard	speechmatics/standard	arar_enbabebgbncacmncmn_encmn_en_ms_tacscydadeelenen_msen_taeoeseteufafifrgaglhehihrhuiaiditjakoltlvmnmrmsmtnlnoplptroruskslsvswtathtltrugukurviyue
xAI	Speech to Text	xai/stt-1	enarcsdanlfrdehiiditjakomsfaplptroruessvthtrvifilmk
ElevenLabs	Scribe v2 Realtime Deprecated	elevenlabs/scribe_v2_realtime	afaframamhararaasasmazazjbebelbgbulbnbenbsbosmymyacacatcscesnynyacycymdadandedeuelellenengesspaetestfafasfffulfifinfrfragagleglglglglugkakatgugujhahauhehebhihinhrhrvhuhunhyhyeidindigiboisislititajajpnjvjavkkkazkmkhmknkankokorkukurkykirlbltzlnlinlolaoltlitlvlavmimrimkmkdmlmalmnmonmrmarmsmsamtmltnenepnlnldnonorocociororipapanplpolpspusptporroronrurussrsrpsdsndskslkslslvsnsnasosomsvsweswswatatamteteltgtgkththatrturukukrururduzuzbviviewowolxhxhozhzhozuzulastcebfilkealuonsoumbyue

Text-to-speech (TTS)

Model family	Model name	Model ID	Languages
Cartesia	Sonic 2	cartesia/sonic-2	enfrdeesptzhjako
	Sonic 3	cartesia/sonic-3	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3 (2025-10-27)	cartesia/sonic-3-2025-10-27	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3 (2026-01-12)	cartesia/sonic-3-2026-01-12	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3 Latest	cartesia/sonic-3-latest	endeesfrjaptzhhikoitnlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3.5	cartesia/sonic-3.5	endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic 3.5 (2026-05-04)	cartesia/sonic-3.5-2026-05-04	endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic Latest	cartesia/sonic-latest	endeesjaptzhhikonlplrusvtrtlbgroarcselfihrmsskdataukhunovibnthhekaidteguknmlmrpa
	Sonic Turbo	cartesia/sonic-turbo	enfrdeesptzhjahiko
	Sonic Retired	cartesia/sonic	enfrdeesptzhjahiitkonlplrusvtr
Deepgram	Aura-2	deepgram/aura-2	enen-USen-PHen-GBen-AUeses-COes-MXes-ESes-419es-ARnlnl-NLfrfr-FRdede-DEitit-ITjaja-JP
Deepgram	Aura-1 Retired	deepgram/aura	enen-USen-IEen-GB
Inworld	Realtime TTS 1.5 Max	inworld/inworld-tts-1.5-max	enzhjakoruitesptfrdeplnlhihear
	Realtime TTS 1.5 Mini	inworld/inworld-tts-1.5-mini	enzhjakoruitesptfrdeplnlhihear
	Realtime TTS 2.0	inworld/inworld-tts-2	enzhjakoruitesptfrdeplnlhihear
	Realtime TTS Retired	inworld/inworld-tts-1	enesfrkonlzhdeitjaplptruhihear
	Realtime TTS Max Retired	inworld/inworld-tts-1-max	enesfrkonlzhdeitjaplptruhihear
Rime	Arcana	rime/arcana	enesfrdehihejaptar
	Coda	rime/coda	enesfrdeptja
	Mist	rime/mist	en
	Mist v2	rime/mistv2	enesfrde
	Mist v3	rime/mistv3	enesfrdehi
xAI	Text to Speech	xai/tts-1	autoenar-EGar-SAar-AEbnzhfrdehiiditjakopt-BRpt-PTrues-MXes-EStrvi
ElevenLabs	Eleven Flash v2 Deprecated	elevenlabs/eleven_flash_v2	en
	Eleven Flash v2.5 Deprecated	elevenlabs/eleven_flash_v2_5	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
	Eleven Multilingual v2 Deprecated	elevenlabs/eleven_multilingual_v2	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukru
	Eleven Turbo v2 Deprecated	elevenlabs/eleven_turbo_v2	en
	Eleven Turbo v2.5 Deprecated	elevenlabs/eleven_turbo_v2_5	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi
	Eleven v3 Deprecated	elevenlabs/eleven_v3	enjazhdehifrkoptitesidnltrfilplsvbgroarcselfihrmsskdataukruhunovi

Retired models

Retired models are no longer accessible. If you're using a retired model, switch to a currently available model.

Billing

LiveKit Inference billing is based on usage. Discounted rates are available on the Scale plan. Custom rates are available on the Enterprise plan. Refer to the following articles for more information on quotas, limits, and billing for LiveKit Inference. The latest pricing is always available on the LiveKit Inference pricing page .

Quotas and limits

Guide to quotas and limits for LiveKit Cloud plans.

Billing

Guide to LiveKit Cloud invoices and billing cycles.

Overview

String descriptors

Zero data retention

Models

Pricing

Large language models (LLM)

Speech-to-text (STT)

Text-to-speech (TTS)

Billing

Quotas and limits

Billing

Ask LiveKit