The voice assistant

The Agents framework also includes a higher level VoiceAssistant class. It uses speech-to-text, text-to-speech, and LLM plugins to provide an interactive and programmable voice assistant. It provides tools for processing and controlling interactions, as well as handling interruptions.

from livekit.agents.llm import ChatContext, ChatMessage, ChatRole
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, openai, silero
async def entrypoint(ctx: JobContext):
initial_ctx = ChatContext(
messages=[
ChatMessage(
role=ChatRole.SYSTEM,
text="You are a sample voice assistant created by LiveKit. " +
"Your interface with users will be voice. You should use short and consise responses, " +
"and avoid use of unpronounceable punctuation.",
)
]
)
assistant = VoiceAssistant(
vad=silero.VAD(),
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=openai.TTS(voice="alloy"),
chat_ctx=initial_ctx,
)
assistant.start(ctx.room)
# allow time for the end user to join the room
await asyncio.sleep(1)
await assistant.say("Hey, how can I help you today?", allow_interruptions=True)

Voice assistants are entirely composable; for example, it is possible to replace one TTS provider with another, with no additional changes to the code.

note:

While this example uses GPT-4o, you can use any LLM compatible with OpenAI’s Chat Completion API, including ollama. You can specify a base_url when creating the LLM:

llm = openai.LLM(model="custom-model", base_url="https://mymodelhost.com/v1")