Create a new agent in your browser using this model
Overview
Cartesia speech-to-text is available in LiveKit Agents through LiveKit Inference and the Cartesia plugin. With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you want to manage your own billing and rate limits. Pricing for LiveKit Inference is available on the pricing page .
LiveKit Inference
Use LiveKit Inference to access Cartesia STT without a separate Cartesia API key.
| Model name | Model ID | Languages |
|---|---|---|
Ink 2 | cartesia/ink-2 | en |
Ink Whisper | cartesia/ink-whisper | enzhdeesrukofrjapttrplcanlarsvitidhifiviheukelmscsrodahutanothurhrbgltlamimlcysktefalvbnsrazslknetmkbreuishynemnbskksqswglmrpasikmsnyosoafockabetgsdguamyilouzfohtpstknnmtsalbmybotlmgastthawlnhabajwsuyue |
Usage
To use Cartesia, use the STT class from the inference module:
from livekit.agents import AgentSession, inferencesession = AgentSession(stt=inference.STT(model="cartesia/ink-whisper",language="en"),# ... tts, stt, vad, turn_handling, etc.)
import { AgentSession, inference } from '@livekit/agents';session = new AgentSession({stt: new inference.STT({model: "cartesia/ink-whisper",language: "en"}),// ... tts, stt, vad, turnHandling, etc.});
Parameters
modelstringThe model to use for the STT.
languageLanguageCodeLanguage code for the transcription. If not set, the provider default applies.
extra_kwargsdictAdditional parameters to pass to the Cartesia STT API. See model parameters for supported fields.
In Node.js this parameter is called modelOptions.
Model parameters
Pass the following parameters inside extra_kwargs (Python) or modelOptions (Node.js):
| Parameter | Type | Default | Notes |
|---|---|---|---|
min_volume | float | Minimum input volume level required to start transcription. | |
max_silence_duration_secs | float | Maximum duration of silence in seconds before ending a transcription segment. |
String descriptors
As a shortcut, you can also pass a model ID string directly to the stt argument in your AgentSession:
from livekit.agents import AgentSessionsession = AgentSession(stt="cartesia/ink-whisper:en",# ... tts, stt, vad, turn_handling, etc.)
import { AgentSession } from '@livekit/agents';session = new AgentSession({stt: "cartesia/ink-whisper:en",// ... tts, stt, vad, turnHandling, etc.});
Plugin
LiveKit's plugin support for Cartesia lets you connect directly to Cartesia's STT API with your own API key. For Node.js, use LiveKit Inference.
Installation
Install the plugin from PyPI:
uv add "livekit-agents[cartesia]~=1.5"
Authentication
The Cartesia plugin requires a Cartesia API key .
Set CARTESIA_API_KEY in your .env file.
Usage
Use Cartesia STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.
The plugin defaults to the ink-2 model, which detects the end of turn on its own. Set turn_detection="stt" so the session uses those signals to know when the user stops speaking:
from livekit.plugins import cartesiasession = AgentSession(stt=cartesia.STT(), # defaults to ink-2turn_handling={"turn_detection": "stt",},# ... llm, tts, etc.)
ink-2 supports English only. To transcribe other languages, pass model="ink-whisper" explicitly. Setting a non-English language without a model automatically selects ink-whisper.
As of livekit-agents 1.5.15, cartesia.STT() defaults to ink-2 instead of ink-whisper, which changes two behaviors:
update_options(model=...)has no effect. Set the model when you construct theSTTinstance instead.- Word-aligned transcripts aren't supported. Use
ink-whisperif your app depends on them.
Parameters
This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.
modelstringDefault: ink-2Selected model to use for STT. Defaults to ink-2 for English and to ink-whisper for other languages. See Cartesia STT models for supported values.
languageLanguageCodeDefault: enLanguage code for the input audio. For supported languages, see Cartesia STT models .
Additional resources
The following resources provide more information about using Cartesia with LiveKit Agents.
Python package
The livekit-plugins-cartesia package on PyPI.
Plugin reference
Reference for the Cartesia STT plugin.
GitHub repo
View the source or contribute to the LiveKit Cartesia STT plugin.
Cartesia docs
Cartesia STT docs.
Voice AI quickstart
Get started with LiveKit Agents and Cartesia STT.
Cartesia TTS
Guide to the Cartesia TTS plugin with LiveKit Agents.