Pre-requisites
You'll need the following for this quickstart:
- Deepgram API key (get one here)
- LiveKit Cloud project or a self-hosted instance of LiveKit server
Building an agent
1. Create a virtualenv
Agents requires Python 3.9+. Some plugins may require 3.10+.
mkdir agent-quickstartcd agent-quickstartpython3 -m venv venvsource venv/bin/activate
2. Install LiveKit Agents
In this quickstart, we'll be using Deepgram for transcription. Other speech-to-text plugins are available in the LiveKit Agents repo.
pip install livekit livekit-agents livekit-plugins-deepgram
3. Agent code
Create a file named agent.py
with the following:
import asyncioimport loggingfrom livekit import agents, rtcfrom livekit.agents import (JobContext,JobRequest,WorkerOptions,cli,)from livekit.plugins.deepgram import STTasync def entrypoint(job: JobContext):logging.info("starting tts example agent")tasks = []async def process_track(audio_stream: rtc.AudioStream):stt = STT()stt_stream = stt.stream()stt_task = asyncio.create_task(process_stt(stt_stream))async for audio_frame_event in audio_stream:stt_stream.push_frame(audio_frame_event.frame)await stt_taskasync def process_stt(stt_stream: agents.stt.STTStream):async for stt_event in stt_stream:if stt_event.type == agents.stt.SpeechEventType.FINAL_TRANSCRIPT:logging.info("Got transcript: %s", stt_event.alternatives[0].text)def on_track_subscribed(track: rtc.Track, *_):if track.kind == rtc.TrackKind.KIND_AUDIO:tasks.append(asyncio.create_task(process_track(rtc.AudioStream(track))))job.room.on("track_subscribed", on_track_subscribed)for participant in job.room.participants.values():for track_pub in participant.tracks.values():# This track is not yet subscribed, when it is subscribed it will# call the on_track_subscribed callbackif track_pub.track is None:continuetasks.append(asyncio.create_task(process_track(rtc.AudioStream(track_pub.track))))async def request_fnc(req: JobRequest) -> None:await req.accept(entrypoint, auto_subscribe=agents.AutoSubscribe.AUDIO_ONLY)if __name__ == "__main__":cli.run_app(WorkerOptions(request_fnc=request_fnc))
Running the Agent
Ensure the following variables are set in your environment:
export LIVEKIT_URL=<your LiveKit server URL>export LIVEKIT_API_KEY=<your API Key>export LIVEKIT_API_SECRET=<your API Secret>export DEEPGRAM_API_KEY=<add_deepgram_key_here>
Then run the agent
python agent.py start
Your worker is now running. Any time a room is created, your agent will join it. You may start as many workers as you like. Each room will be assigned to a single worker. Load balancing is handled by LiveKit server which employs a "fill-first" strategy. By default, worker capacity is determined by cpu load but this behavior can be customized.
Testing the Agent
Your agent can now interact with your end-users in the same room via browser or native apps.
To make prototyping and testing easier, we've created an example frontend you can use with any agent running on the backend. Head over to the Agents Playground.
Design Notes
There are several things worth noting about the agent code above:
Worker and agent
Your program/script becomes a worker by using the agents.Worker
class
async def request_fnc(req: JobRequest) -> None:await req.accept(entrypoint, auto_subscribe=agents.AutoSubscribe.AUDIO_ONLY)if __name__ == "__main__":cli.run_app(WorkerOptions(request_fnc=request_fnc))
When you run the program with the start command, it establishes a persistent WebSocket connection to LiveKit server. This setup inverts the typical request/response model, allowing LiveKit server to dispatch job requests directly to the worker.
The request_fnc
function is triggered when a new job is available for your worker. You have the option to accept or reject the job after reviewing its details. When you accept a job, an instance of your agent is created on a dedicated process, which then joins the room specified in the JobRequest
.
How does my agent join the room?
Normally client applications connect to LiveKit using a URL and token, but there are no explicit references to either in the example above.
The Agents framework simplifies the connection and authentication process by automatically generating a token for the agent (which also serves to identify the agent with LiveKit server).
JobContext
At the point where your agent's application code is invoked, it has already joined the room. Your agent is provided with a JobContext
object containing information about the session. The JobContext
object contains the connected LiveKit rtc.Room
object which can be used to interact in the room.