Cartesia STT | LiveKit Documentation

Create a new agent in your browser using this model

Overview

Cartesia speech-to-text is available in LiveKit Agents through LiveKit Inference and the Cartesia plugin. With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you want to manage your own billing and rate limits. Pricing for LiveKit Inference is available on the pricing page .

LiveKit Inference

Use LiveKit Inference to access Cartesia STT without a separate Cartesia API key.

Model name	Model ID	Languages
Ink Whisper	cartesia/ink-whisper	enzhdeesrukofrjapttrplcanlarsvitidhifiviheukelmscsrodahutanothurhrbgltlamimlcysktefalvbnsrazslknetmkbreuishynemnbskksqswglmrpasikmsnyosoafockabetgsdguamyilouzfohtpstknnmtsalbmybotlmgastthawlnhabajwsuyue

Usage

To use Cartesia, use the STT class from the inference module:

from livekit.agents import AgentSession, inference

session = AgentSession(
    stt=inference.STT(
        model="cartesia/ink-whisper", 
        language="en"
    ),
    # ... tts, stt, vad, turn_handling, etc.
)

import { AgentSession, inference } from '@livekit/agents';

session = new AgentSession({
    stt: new inference.STT({ 
        model: "cartesia/ink-whisper", 
        language: "en" 
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

Parameters

model

Required

string

The model to use for the STT.

languageLanguageCode

Language code for the transcription. If not set, the provider default applies.

extra_kwargsdict

Additional parameters to pass to the Cartesia STT API. See model parameters for supported fields.

In Node.js this parameter is called modelOptions.

Model parameters

Pass the following parameters inside extra_kwargs (Python) or modelOptions (Node.js):

Parameter	Type	Default	Notes
`min_volume`	`float`		Minimum input volume level required to start transcription.
`max_silence_duration_secs`	`float`		Maximum duration of silence in seconds before ending a transcription segment.

String descriptors

As a shortcut, you can also pass a model ID string directly to the stt argument in your AgentSession:

from livekit.agents import AgentSession

session = AgentSession(
    stt="cartesia/ink-whisper:en",
    # ... tts, stt, vad, turn_handling, etc.
)

import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    stt: "cartesia/ink-whisper:en",
    // ... tts, stt, vad, turnHandling, etc.
});

Plugin

LiveKit's plugin support for Cartesia lets you connect directly to Cartesia's STT API with your own API key. For Node.js, use LiveKit Inference.

Available in

Python

Installation

Install the plugin from PyPI:

uv add "livekit-agents[cartesia]~=1.5"

Authentication

The Cartesia plugin requires a Cartesia API key .

Set CARTESIA_API_KEY in your .env file.

Usage

Use Cartesia STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import cartesia

session = AgentSession(
   stt = cartesia.STT(
      model="ink-whisper"
   ),
   # ... llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

modelstringDefault: ink-whisper

Selected model to use for STT. See Cartesia STT models for supported values.

languageLanguageCodeDefault: en

Language code for the input audio. For supported languages, see Cartesia STT models .

Additional resources

The following resources provide more information about using Cartesia with LiveKit Agents.

Python package

The livekit-plugins-cartesia package on PyPI.

Plugin reference

Reference for the Cartesia STT plugin.

GitHub repo

View the source or contribute to the LiveKit Cartesia STT plugin.

Cartesia docs

Cartesia STT docs.

Voice AI quickstart

Get started with LiveKit Agents and Cartesia STT.

Cartesia TTS

Guide to the Cartesia TTS plugin with LiveKit Agents.