TTS Comparison | LiveKit docs

This example demonstrates a voice assistant that allows real-time switching between different Text-to-Speech providers to compare voice quality, latency, and characteristics. Each provider has its own agent class, and function tools allow seamless switching mid-conversation.

Prerequisites

Add a .env in this directory with your LiveKit credentials and API keys for each TTS provider:

LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
OPENAI_API_KEY=your_openai_key
DEEPGRAM_API_KEY=your_deepgram_key
RIME_API_KEY=your_rime_key
ELEVENLABS_API_KEY=your_elevenlabs_key
CARTESIA_API_KEY=your_cartesia_key
PLAYAI_API_KEY=your_playai_key

Install dependencies:

pip install "livekit-agents[silero,deepgram,openai,rime,elevenlabs,playai,cartesia]" python-dotenv

Load environment and create the AgentServer

Import the necessary modules, load environment variables, and create an AgentServer. The VAD is prewarmed once per process and shared across all agents.

import logging
from dotenv import load_dotenv
from livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, function_tool
from livekit.plugins import deepgram, openai, rime, elevenlabs, cartesia, playai, silero

logger = logging.getLogger("tts-comparison")
logger.setLevel(logging.INFO)

load_dotenv()

server = AgentServer()

Prewarm VAD for faster connections

Preload the VAD model once per process. The VAD instance is passed to each agent so they can reuse it when switching providers.

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

server.setup_fnc = prewarm

Define agents for each TTS provider

Each agent class configures a different TTS provider while sharing the same STT, LLM, and VAD. Function tools return new agent instances to enable switching. The VAD is stored so it can be passed to the next agent during transfer.

class RimeAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="""
                You are a helpful assistant communicating through voice.
                You are currently using the Rime TTS provider.
                You can switch to a different TTS provider if asked.
                Don't use any unpronouncable characters.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=rime.TTS(),
            vad=vad
        )
        self._vad = vad

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm now using the Rime TTS voice. How does it sound?")

    @function_tool
    async def switch_to_elevenlabs(self):
        """Switch to ElevenLabs TTS voice"""
        return ElevenLabsAgent(self._vad)

    @function_tool
    async def switch_to_cartesia(self):
        """Switch to Cartesia TTS voice"""
        return CartesiaAgent(self._vad)

    @function_tool
    async def switch_to_playai(self):
        """Switch to PlayAI TTS voice"""
        return PlayAIAgent(self._vad)

Additional TTS provider agents

The ElevenLabs, Cartesia, and PlayAI agents follow the same pattern—each configures its own TTS provider and provides function tools to switch to the other providers.

class ElevenLabsAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="...",
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=elevenlabs.TTS(),
            vad=vad
        )
        self._vad = vad

    # ... on_enter and switch functions

Create the RTC session entrypoint

Start with the Rime agent and pass the prewarmed VAD. The session handles agent transfers automatically when function tools return new agents.

@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession()

    await session.start(
        agent=RimeAgent(vad=ctx.proc.userdata["vad"]),
        room=ctx.room
    )
    await ctx.connect()

Run it

python tts_comparison.py dev

Try these commands to switch between providers:

"Switch to ElevenLabs"
"Use the Cartesia voice"
"Let me hear PlayAI"
"Go back to Rime"

How it works

Session starts with the Rime TTS provider.
Agent introduces itself using the current voice.
User can request to switch providers (e.g., "Switch to ElevenLabs").
Function tool returns a new agent instance with the requested TTS.
Session transfers to the new agent and on_enter() provides audio confirmation.
The prewarmed VAD is passed to each new agent during transfer.

Full example

import logging
from dotenv import load_dotenv
from livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, function_tool
from livekit.plugins import deepgram, openai, rime, elevenlabs, cartesia, playai, silero

logger = logging.getLogger("tts-comparison")
logger.setLevel(logging.INFO)

load_dotenv()

class RimeAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="""
                You are a helpful assistant communicating through voice.
                You are currently using the Rime TTS provider.
                You can switch to a different TTS provider if asked.
                Don't use any unpronouncable characters.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=rime.TTS(),
            vad=vad
        )
        self._vad = vad

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm now using the Rime TTS voice. How does it sound?")

    @function_tool
    async def switch_to_elevenlabs(self):
        """Switch to ElevenLabs TTS voice"""
        return ElevenLabsAgent(self._vad)

    @function_tool
    async def switch_to_cartesia(self):
        """Switch to Cartesia TTS voice"""
        return CartesiaAgent(self._vad)

    @function_tool
    async def switch_to_playai(self):
        """Switch to PlayAI TTS voice"""
        return PlayAIAgent(self._vad)


class ElevenLabsAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="""
                You are a helpful assistant communicating through voice.
                You are currently using the ElevenLabs TTS provider.
                You can switch to a different TTS provider if asked.
                Don't use any unpronouncable characters.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=elevenlabs.TTS(),
            vad=vad
        )
        self._vad = vad

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm now using the ElevenLabs TTS voice. What do you think of how I sound?")

    @function_tool
    async def switch_to_rime(self):
        """Switch to Rime TTS voice"""
        return RimeAgent(self._vad)

    @function_tool
    async def switch_to_cartesia(self):
        """Switch to Cartesia TTS voice"""
        return CartesiaAgent(self._vad)

    @function_tool
    async def switch_to_playai(self):
        """Switch to PlayAI TTS voice"""
        return PlayAIAgent(self._vad)


class CartesiaAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="""
                You are a helpful assistant communicating through voice.
                You are currently using the Cartesia TTS provider.
                You can switch to a different TTS provider if asked.
                Don't use any unpronouncable characters.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=cartesia.TTS(),
            vad=vad
        )
        self._vad = vad

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm now using the Cartesia TTS voice. How do I sound to you?")

    @function_tool
    async def switch_to_rime(self):
        """Switch to Rime TTS voice"""
        return RimeAgent(self._vad)

    @function_tool
    async def switch_to_elevenlabs(self):
        """Switch to ElevenLabs TTS voice"""
        return ElevenLabsAgent(self._vad)

    @function_tool
    async def switch_to_playai(self):
        """Switch to PlayAI TTS voice"""
        return PlayAIAgent(self._vad)


class PlayAIAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="""
                You are a helpful assistant communicating through voice.
                You are currently using the PlayAI TTS provider.
                You can switch to a different TTS provider if asked.
                Don't use any unpronouncable characters.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=playai.TTS(),
            vad=vad
        )
        self._vad = vad

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm now using the PlayAI TTS voice. What are your thoughts on how I sound?")

    @function_tool
    async def switch_to_rime(self):
        """Switch to Rime TTS voice"""
        return RimeAgent(self._vad)

    @function_tool
    async def switch_to_elevenlabs(self):
        """Switch to ElevenLabs TTS voice"""
        return ElevenLabsAgent(self._vad)

    @function_tool
    async def switch_to_cartesia(self):
        """Switch to Cartesia TTS voice"""
        return CartesiaAgent(self._vad)


server = AgentServer()

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

server.setup_fnc = prewarm

@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession()

    await session.start(
        agent=RimeAgent(vad=ctx.proc.userdata["vad"]),
        room=ctx.room
    )
    await ctx.connect()

if __name__ == "__main__":
    cli.run_app(server)