Skip to main content

SIP primer

Learn how SIP calls flow in LiveKit to connect traditional telephony with realtime communications.

Overview

Session Initiation Protocol (SIP) is a signaling protocol for starting, managing, and ending realtime voice and video calls over IP networks. LiveKit uses SIP to connect traditional telephony systems—like desk phones, softphones, and PSTN networks—to WebRTC-based applications. With LiveKit Telephony, you can route SIP calls into or out of LiveKit rooms for realtime communication.

SIP handles call setup and control, negotiating media capabilities and establishing the connection between caller and callee. After the connection is established, audio flows over Real-time Transfer Protocol (RTP).

The following sections describe the step-by-step flow for how SIP integrates with LiveKit to enable seamless call routing between telephony systems and LiveKit rooms.

How calls connect

The following diagram describes how calls are connected to LiveKit for both inbound and outbound calls.

Loading diagram…

How inbound calls connect

The following sections outline the initial setup for an inbound connection.

User dials a phone number

This could be from a mobile phone, desk phone, or softphone.

Call enters the PSTN

The Public Switched Telephone Network (PSTN) is the global phone network. The PSTN routes the call to the SIP trunk (Twilio, Telnyx, and others) associated with the destination number.

LiveKit Phone Numbers

With LiveKit Phone Numbers, the call skips all trunking and goes straight to LiveKit.

SIP provider → LiveKit SIP endpoint

The SIP provider receives the PSTN call and sees you've configured a LiveKit SIP endpoint as the Origination URI. The SIP provider initiates a SIP call to LiveKit by sending an INVITE request.

SIP provider configuration

Your SIP trunking provider must be configured to use the LiveKit SIP endpoint. To learn more, see SIP trunk setup.

Agent picks up the call

The agent picks up the call by joining the room, but audio is not exchanged until the SIP handshake is complete.

Agent setup

Your agent must be configured to join the room when a call is received. To learn more, see Agent dispatch.

The final steps to complete an inbound call connection are shared with the outbound call connection. See Completing the call connection for details.

How outbound calls connect

The following sections outline the initial setup for an outbound connection.

Initiate a call from LiveKit

Use the SIP API to initiate a call. LiveKit validates the request.

How to initiate a call

You can make a call from LiveKit using the SIP API or the CLI. To learn more, see Making outbound calls.

LiveKit SIP → SIP provider

LiveKit initiates a SIP session based on the request.

SIP provider configuration

You must have an outbound trunk and configure your SIP trunking provider to use the LiveKit SIP endpoint.

Call enters the PSTN

The SIP trunking provider routes the call to the PSTN, which routes it to the user's phone.

User picks up the call

When the user picks up the call, a 200 OK response is sent back to LiveKit. Depending on the device, the initial response might be different:

  • If the user's device is a softphone or desk phone, a 200 OK response is sent.
  • If it's a mobile phone, the equivalent of a 200 OK—Answer Message (ANM)—is sent using Signaling System 7 (SS7) protocol to the cellphone provider. This signal is later converted to a 200 OK and sent to LiveKit.

The final steps in completing an outbound call connection are shared with the inbound call connection. See Completing the call connection for details.

Completing the call connection

Once the initial path is established (via either inbound or outbound flow), the following steps finalize the connection.

Provider and LiveKit complete the SIP handshake

A SIP handshake is the sequence of request-response messages exchanged between endpoints to establish a call. The handshake negotiates capabilities (for example, which codecs to use when exchanging media), authenticates endpoints (if required), and sets up the media connection for the call.

Media transfer

Once the SIP handshake is complete, the caller and callee exchange audio data via RTP or Secure RTP (SRTP) packets. Media is bridged between the caller and LiveKit's internal media pipeline.

Media timeout

If the first RTP packet isn't received within 30 seconds (or, if at least one RTP packet has already been received, 15 seconds), the call is disconnected with a media timeout error.

Agent receives audio and responds

Audio flows from the caller to the agent via RTP. Agent-generated audio flows back to the caller via RTP.

Agent setup

Your agent must be running and dispatched to the LiveKit room the caller or callee is in. Alternatively, for outbound calls, the agent can initiate the phone call. To learn more, see Agents telephony integration.

SIP handshake and audio codecs negotiation

The SIP handshake is a series of messages exchanged between caller and callee that negotiates capabilities and establishes the connection. The capabilities negotiation is done using Session Description Protocol (SDP) as part of the SIP handshake.

An audio codec is part of the negotiated capabilities during the SIP handshake. It defines how voice audio is compressed, encoded, transmitted, and decoded during a call. In SIP calls, codecs determine the audio quality, bandwidth usage, and compatibility between endpoints. Choosing the right codec matters because both sides must support the same codec to exchange audio. To learn more, see Additional resources.

Additional resources

The following resources provide additional details about the topics covered in this guide.

Next steps

Learn more about how to set up inbound and outbound calls with LiveKit.