Agents Quickstart

Let's build a simple agent that transcribes voice input into text and logs it to the console.

Pre-requisites

You'll need the following for this quickstart:

Building an agent

1. Create a virtualenv

Agents requires Python 3.9+. Some plugins may require 3.10+.

mkdir agent-quickstart
cd agent-quickstart
python3 -m venv venv
source venv/bin/activate

2. Install LiveKit Agents

In this quickstart, we'll be using Deepgram for transcription. Other speech-to-text plugins are available in the LiveKit Agents repo.

pip install livekit livekit-agents livekit-plugins-deepgram

3. Agent code

Create a file named agent.py with the following:

import asyncio
import logging
from livekit import agents, rtc
from livekit.agents import (
JobContext,
JobRequest,
WorkerOptions,
cli,
)
from livekit.plugins.deepgram import STT
async def entrypoint(job: JobContext):
logging.info("starting tts example agent")
tasks = []
async def process_track(audio_stream: rtc.AudioStream):
stt = STT()
stt_stream = stt.stream()
stt_task = asyncio.create_task(process_stt(stt_stream))
async for audio_frame_event in audio_stream:
stt_stream.push_frame(audio_frame_event.frame)
await stt_task
async def process_stt(stt_stream: agents.stt.STTStream):
async for stt_event in stt_stream:
if stt_event.type == agents.stt.SpeechEventType.FINAL_TRANSCRIPT:
logging.info("Got transcript: %s", stt_event.alternatives[0].text)
def on_track_subscribed(track: rtc.Track, *_):
if track.kind == rtc.TrackKind.KIND_AUDIO:
tasks.append(asyncio.create_task(process_track(rtc.AudioStream(track))))
job.room.on("track_subscribed", on_track_subscribed)
for participant in job.room.participants.values():
for track_pub in participant.tracks.values():
# This track is not yet subscribed, when it is subscribed it will
# call the on_track_subscribed callback
if track_pub.track is None:
continue
tasks.append(
asyncio.create_task(process_track(rtc.AudioStream(track_pub.track)))
)
async def request_fnc(req: JobRequest) -> None:
await req.accept(entrypoint, auto_subscribe=agents.AutoSubscribe.AUDIO_ONLY)
if __name__ == "__main__":
cli.run_app(WorkerOptions(request_fnc=request_fnc))

Running the Agent

Ensure the following variables are set in your environment:

export LIVEKIT_URL=<your LiveKit server URL>
export LIVEKIT_API_KEY=<your API Key>
export LIVEKIT_API_SECRET=<your API Secret>
export DEEPGRAM_API_KEY=<add_deepgram_key_here>

Then run the agent

python agent.py start

Your worker is now running. Any time a room is created, your agent will join it. You may start as many workers as you like. Each room will be assigned to a single worker. Load balancing is handled by LiveKit server which employs a "fill-first" strategy. By default, worker capacity is determined by cpu load but this behavior can be customized.

Testing the Agent

Your agent can now interact with your end-users in the same room via browser or native apps.

To make prototyping and testing easier, we've created an example frontend you can use with any agent running on the backend. Head over to the Agents Playground.

Design Notes

There are several things worth noting about the agent code above:

Worker and agent

Your program/script becomes a worker by using the agents.Worker class

async def request_fnc(req: JobRequest) -> None:
await req.accept(entrypoint, auto_subscribe=agents.AutoSubscribe.AUDIO_ONLY)
if __name__ == "__main__":
cli.run_app(WorkerOptions(request_fnc=request_fnc))

When you run the program with the start command, it establishes a persistent WebSocket connection to LiveKit server. This setup inverts the typical request/response model, allowing LiveKit server to dispatch job requests directly to the worker.

The request_fnc function is triggered when a new job is available for your worker. You have the option to accept or reject the job after reviewing its details. When you accept a job, an instance of your agent is created on a dedicated process, which then joins the room specified in the JobRequest.

How does my agent join the room?

Normally client applications connect to LiveKit using a URL and token, but there are no explicit references to either in the example above.

The Agents framework simplifies the connection and authentication process by automatically generating a token for the agent (which also serves to identify the agent with LiveKit server).

JobContext

At the point where your agent's application code is invoked, it has already joined the room. Your agent is provided with a JobContext object containing information about the session. The JobContext object contains the connected LiveKit rtc.Room object which can be used to interact in the room.