Modality-aware instructions | LiveKit Documentation

Overview

A single agent can serve both voice and text users in the same session, but the two input types can benefit from different instructions. Spoken input arrives as imperfect transcription and can contain relative expressions (for example, "next Tuesday"), self-corrections, and filler words, so the LLM might need additional guidance to interpret it correctly. Typed input, on the other hand, is usually more precise and literal, so these same instructions can degrade text responses by adding spoken-style confirmations or stripping useful formatting.

The Instructions class holds two variants of your system prompt, one for audio and one for text. The framework applies the variant that matches each turn's input modality before calling the LLM, so voice turns get the audio prompt and text turns get the text prompt automatically.

Beta in Python

In Python, Instructions is exported from livekit.agents.beta and is subject to change. In Node.js, it's a stable export of the main llm namespace (and also re-exported from beta).

Define instructions per modality

Create an Instructions object with audio and text variants and pass it wherever you would pass an instructions string, such as the Agent constructor. The text variant is optional and falls back to the audio variant when omitted.

from livekit.agents import Agent
from livekit.agents.beta import Instructions

instructions = Instructions(
    audio=(
        "You are a scheduling assistant. The user is speaking, so their input may be "
        "imperfect. Resolve spoken expressions like 'next Tuesday' to concrete dates, "
        "honor verbal self-corrections, and confirm the date and time out loud before booking."
    ),
    text=(
        "You are a scheduling assistant. The user is typing, so take their input literally. "
        "Accept exact dates and times in any common format and skip verbal confirmations."
    ),
)


class SchedulingAgent(Agent):
    def __init__(self) -> None:
        super().__init__(instructions=instructions)

import { llm, voice } from '@livekit/agents';

const instructions = new llm.Instructions({
  audio:
    'You are a scheduling assistant. The user is speaking, so their input may be ' +
    "imperfect. Resolve spoken expressions like 'next Tuesday' to concrete dates, " +
    'honor verbal self-corrections, and confirm the date and time out loud before booking.',
  text:
    'You are a scheduling assistant. The user is typing, so take their input literally. ' +
    'Accept exact dates and times in any common format and skip verbal confirmations.',
});

const schedulingAgent = voice.Agent.create({ instructions });

How variants are applied

During a session, the framework selects the variant that matches the input modality of each turn: the audio variant for spoken turns and the text variant for typed turns. Both variants are preserved across turns, so an agent that handles a voice turn followed by a text turn uses the correct prompt for each.

Select the active variant

When you generate a reply manually, specify the variant with the input_modality (Python) or inputModality (Node.js) parameter:

# Use the audio variant for this reply
session.generate_reply(input_modality="audio")

// Use the audio variant for this reply
session.generateReply({ inputModality: 'audio' });

To explicitly set the active variant, use as_modality (Python) or asModality (Node.js). This returns a copy of the instructions with the selected variant active. Both variants are preserved, so you can switch between them as needed.

# Return a copy whose active value is the text variant
text_first = instructions.as_modality("text")

// Return a copy whose active value is the text variant
const textFirst = instructions.asModality('text');

Compose instructions

You can build instructions from reusable pieces while keeping both variants intact. A shared base prompt can be combined with modality-specific guidance using concatenation and templating.

In Python, Instructions subclasses str. Use + to concatenate and format to substitute values. Both handle each variant separately:

base = Instructions(
    audio="You are Alex, a scheduling assistant.\n{modality_specific}",
    text="You are Alex, a scheduling assistant.\n{modality_specific}",
)

modality_specific = Instructions(
    audio="Resolve spoken dates and confirm out loud.",
    text="Accept literal dates and skip confirmations.",
)

# `format` applies to both variants at once
instructions = base.format(modality_specific=modality_specific)

# `+` also works and preserves both variants
instructions = instructions + "\nThe current date is 2026-05-29."

In Node.js, use the Instructions.tpl tagged template to compose with template literals, or concatInstructions to join a mix of strings and Instructions. Both handle each variant separately:

import { llm } from '@livekit/agents';

const modalitySpecific = new llm.Instructions({
  audio: 'Resolve spoken dates and confirm out loud.',
  text: 'Accept literal dates and skip confirmations.',
});

// `tpl` interpolates each variant from any embedded Instructions
const instructions = llm.Instructions.tpl`You are Alex, a scheduling assistant.
${modalitySpecific}
The current date is 2026-05-29.`;

// `concatInstructions` joins strings and Instructions, preserving both variants
const combined = llm.concatInstructions('Base prompt. ', modalitySpecific);

Customize built-in tasks

Available in

Beta

Python

Prebuilt tasks ship with their own default prompts. The beta InstructionParts type lets you customize those prompts without rewriting them. Set persona to change the agent's identity and extra to append domain-specific context. Leave a field unset to keep the task's built-in default, or set it to an empty string to remove that section entirely. Each field accepts a plain string or an Instructions object, so customizations can themselves be modality-aware.

To apply a customization, pass an InstructionParts object as a task's instructions argument:

from livekit.agents.beta import Instructions
from livekit.agents.beta.workflows import GetEmailTask, InstructionParts

task = GetEmailTask(
    instructions=InstructionParts(
        persona="You are Riley, a friendly intake assistant collecting a contact email.",
        # `extra` is itself modality-aware: confirm out loud for voice, stay quiet for text
        extra=Instructions(
            audio="Confirm the spelling out loud, letter by letter, for unusual domains.",
            text="Accept the email exactly as typed; only re-prompt if it's clearly malformed.",
        ),
    )
)

Additional resources

A complete, runnable example agent that sets different instructions for voice and text users:

Per-modality instructions (Node.js)

A scheduling assistant with separate audio and text prompts, built with the Node.js SDK.