Text-to-speech (TTS) models produce realtime synthetic speech from text input. In voice AI, this allows a text-based LLM to speak its response to the user.
The agents framework includes plugins for popular TTS providers out of the box. You can also implement the TTS node to provide custom behavior or use an alternative provider.
LiveKit is open source and welcomes new plugin contributions.
How to use
The following sections describe high-level usage only.
For more detailed information about installing and using plugins, see the plugins overview.
Usage in AgentSession
Construct an AgentSession
or Agent
with a TTS
instance created by your desired plugin:
from livekit.agents import AgentSessionfrom livekit.plugins import cartesiasession = AgentSession(tts=cartesia.TTS(model="sonic-english"))
AgentSession
automatically sends LLM responses to the TTS model, and also supports a say
method for one-off responses.
Standalone usage
You can also use a TTS
instance in a standalone fashion by creating a stream. You can use push_text
to add text to the stream, and then consume a stream of SynthesizedAudio
as to publish as realtime audio to another participant.
Here is an example of a standalone TTS app:
from livekit import agents, rtcfrom livekit.agents.tts import SynthesizedAudiofrom livekit.plugins import cartesiafrom typing import AsyncIterableasync def entrypoint(ctx: agents.JobContext):await ctx.connect()text_stream: AsyncIterable[str] = ... # you need to provide a stream of textaudio_source = rtc.AudioSource(44100, 1)track = rtc.LocalAudioTrack.create_audio_track("agent-audio", audio_source)await ctx.room.local_participant.publish_track(track)tts = cartesia.TTS(model="sonic-english")tts_stream = tts.stream()# create a task to consume and publish audio framesctx.create_task(send_audio(tts_stream))# push text into the stream, TTS stream will emit audio frames along with events# indicating sentence (or segment) boundaries.async for text in text_stream:tts_stream.push_text(text)tts_stream.end_input()async def send_audio(audio_stream: AsyncIterable[SynthesizedAudio]):async for a in audio_stream:await audio_source.capture_frame(e.audio.frame)if __name__ == "__main__":agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
Available providers
The following table lists the available TTS providers for LiveKit Agents.
Provider | Plugin | |
---|---|---|
Amazon Polly | aws | |
Azure AI Speech | azure | |
Cartesia | cartesia | |
Deepgram | deepgram | |
ElevenLabs | elevenlabs | |
Google Cloud | google | |
Groq | groq | |
Neuphonic | neuphonic | |
OpenAI | openai | |
PlayHT | playai | |
Rime | rime |