What does a LiveKit developer build?

A LiveKit developer builds real-time voice and video systems using the LiveKit Agents SDK and server infrastructure. This includes: VoicePipelineAgent configuration (STT → LLM → TTS), OpenAI Realtime API integration, multi-agent handoff systems, SIP module setup for PSTN telephony, WebRTC room management, and custom plugin pipelines for non-standard audio processing. We handle the full stack — LiveKit server deployment, agent code, integrations, and production monitoring.

When does LiveKit make more sense than Vapi?

LiveKit wins above 10,000 minutes per month. At that volume, LiveKit Cloud plus direct provider costs (Deepgram, GPT-4o, ElevenLabs) runs $500–1,000/month versus $2,300–3,300/month on Vapi BYOK. At 50k minutes, the gap is $2,500–5,000 versus $11,500–16,500. LiveKit also wins when you need: multi-agent handoffs, video rooms with AI participants, self-hosted deployment for HIPAA, or direct OpenAI Realtime API integration for sub-200ms latency.

What is the OpenAI Realtime API and why does LiveKit pair well with it?

GPT-4o Realtime accepts and outputs native audio over a WebSocket — no speech-to-text or text-to-speech step. This cuts per-turn latency from 360–920ms (standard pipeline) to 150–250ms. LiveKit provides the WebRTC transport layer: room management, TURN/STUN, audio track routing, and participant lifecycle. Together they form the lowest-latency voice AI architecture available. We also build a hot-standby fallback pipeline that activates automatically if the Realtime API is rate-limited.

Can LiveKit handle SIP calls to regular phone numbers?

Yes. LiveKit has a dedicated SIP module that integrates with Twilio, Vonage, and Bandwidth SIP trunks for PSTN inbound and outbound calling. We configure codec negotiation (PCMU/Opus transcoding), DTMF handling for IVR scenarios, call recording via LiveKit Egress, and real-time transcription at the SIP layer. The same agent pipeline handles both WebRTC (browser/mobile) and SIP calls without separate code paths.

How does multi-agent handoff work in LiveKit?

LiveKit supports multiple agent participants in the same room. When a handoff is triggered, the first agent passes a structured context object — compressed conversation summary, extracted entities, caller intent — to the incoming agent via AgentHandoff. The new agent joins the room with full context, the first agent leaves gracefully, and the caller experiences a seamless transition. Common patterns: triage → specialist, receptionist → scheduling, AI agent → human-in-the-loop. Standard managed platforms (Vapi, Retell) do not support this architecture.

Hestur

LiveKit · Open-Source Voice Infrastructure

LiveKit is open-source.
Making it work at scale isn't.

We build production LiveKit systems: multi-agent voice pipelines, SIP telephony, WebRTC rooms, and OpenAI Realtime API integrations. Above 10k minutes/month, it beats every managed platform on cost.

Sub-300ms voice latency10k+ min/monthWebRTC + SIPOpen-source SDK

Book a Discovery Call LiveKit vs Vapi Cost Calculator

What We Build

Six things we've
built on LiveKit

Multi-Agent Voice Pipelines

Supervisor agents that route inbound calls to specialist sub-agents — triage → scheduling, intake → billing, receptionist → clinical team. Each agent has its own pipeline and prompt; LiveKit passes room context and transcript history between handoffs.

Healthcare, insurance, legal, financial services

Sub-300ms handoff latency between agents

SIP Telephony at Scale

Full PSTN coverage via LiveKit SIP module — inbound and outbound calling over Twilio, Vonage, or Bandwidth trunks. We configure codec negotiation, DTMF handling, call recording, and real-time transcription at the SIP layer.

Contact centres, outbound sales, healthcare scheduling

10k–200k+ minutes/month without per-minute platform fees

Video + Voice Rooms

Combined video and voice experiences: telehealth consultations with AI scribing, live customer support with screen share, multi-participant rooms where an AI agent attends alongside human participants.

Telehealth, EdTech, customer success, remote support

WebRTC rooms with AI participants at <150ms A/V latency

OpenAI Realtime API Integration

GPT-4o Realtime runs directly inside a LiveKit Agents worker — native audio in/out, zero intermediate transcription step, ultra-low latency responses, and built-in voice activity detection. The fastest voice AI architecture available.

SaaS voice features, consumer apps, high-frequency call centres

First-word latency under 200ms with Realtime API

Custom Agent Frameworks

Beyond the default VoicePipelineAgent — we build custom plugin pipelines for non-standard audio processing, multi-modal agents that combine voice with vision (GPT-4o Vision), and worker pools for high-concurrency deployments.

Enterprise SaaS, security, logistics, field services

Horizontal scaling via LiveKit Workers + Kubernetes

Browser & Mobile WebRTC

LiveKit JS/Swift/Kotlin SDKs wired directly into your web app or mobile product — real-time audio rooms, AI participants, push-to-talk interfaces, and live transcription overlays without a third-party platform in the loop.

Consumer apps, internal tools, mobile-first products

Direct browser-to-agent WebRTC, no SIP hop needed

Architecture Spotlight

OpenAI Realtime API
on LiveKit

The fastest voice AI architecture available — GPT-4o native audio, zero intermediate transcription, sub-200ms first-word latency.

Standard Pipeline (Vapi / Retell)

Microphone audio

→ STT (Deepgram) — 60–120ms

→ LLM (GPT-4o) — 200–600ms

→ TTS (ElevenLabs) — 100–200ms

→ Speaker output

Total: 360ms–920ms per turn

Realtime API on LiveKit

Microphone audio

→ LiveKit room (WebRTC transport)

→ GPT-4o Realtime WebSocket

(audio in → audio out, no text hop)

→ LiveKit audio track → speaker

Total: 150ms–250ms per turn

Native voice activity detection

GPT-4o Realtime handles interruption natively — no separate VAD module, no endpointing tuning. The model knows when the user is done speaking.

No transcription latency tax

The standard STT → LLM → TTS chain adds 300–700ms per turn. Realtime eliminates the STT step entirely — audio goes straight to the model and comes back as audio.

LiveKit as the WebRTC transport

LiveKit handles room management, participant tracking, TURN/STUN, audio mixing, and recording. The Realtime API just handles AI processing. Best-of-both-worlds architecture.

Fallback pipeline included

We build a hot-standby STT → LLM → TTS pipeline that activates if Realtime API is rate-limited or degraded. Zero-downtime failover, invisible to the caller.

Why LiveKit Wins at Scale

The 10k Minute Crossover

Below 10k minutes/month, Vapi BYOK is the right choice. Above it, LiveKit infrastructure costs collapse while managed platform fees keep compounding.

Platform	Cost/min	10k min/month	50k min/month	Note
Managed (Vapi/Retell bundled)	$0.45–0.60/min	$4,500–6,000	$22,500–30,000	No infra control
Vapi BYOK	$0.23–0.33/min	$2,300–3,300	$11,500–16,500	Best-in-class managed
LiveKit Cloud + providers	$0.05–0.10/min	$500–1,000	$2,500–5,000	Our crossover recommendation
LiveKit self-hosted + providers	$0.02–0.04/min	$200–400	$1,000–2,000	Maximum cost control

< 10k min/month

Use Vapi BYOK

LiveKit infra setup cost outweighs per-minute savings at low volume.

10k–50k min/month

LiveKit Cloud

Managed LiveKit eliminates infra ops while cutting costs 5–10×.

50k+ min/month

LiveKit self-hosted

At scale, self-hosted LiveKit on Kubernetes is $0.02–0.04/min total.

Architecture Deep-Dive

Multi-Agent Handoffs

LiveKit is the only voice AI platform with a real multi-agent architecture. We design and build agent topologies that managed platforms like Vapi and Retell can't replicate.

Supervisor Pattern — Healthcare Example

Inbound call → Room created

→ Receptionist Agent joins

"How can I help you today?"

→ Caller: "I need to see Dr. Chen"

→ AgentHandoff triggered

context = { name, intent, caller_history }

→ Scheduling Agent joins room

Receptionist leaves gracefully

→ "I have you as [Name]..."

No repeat of earlier context

→ Appointment booked → room closed

Triage → Specialist

A triage agent collects the reason for contact and routes to the appropriate specialist agent (billing, scheduling, clinical, complaints). Each specialist has a tailored prompt and tool set.

Supervisor → Worker

A supervisor agent orchestrates multiple worker agents running in parallel — one handling transcription, one doing real-time CRM lookup, one managing the conversation. LangGraph-like coordination over voice.

Human-in-the-Loop

An AI agent handles the conversation until a human expert is needed. The human joins the LiveKit room (as a participant), reviews the conversation summary, and takes over seamlessly — or provides a whisper prompt the AI agent reads aloud.

Technical Scope

What's in the build

Capability	LiveKit	Implementation notes
VoicePipelineAgent (STT → LLM → TTS)	✓	Core pipeline, all providers
OpenAI Realtime API (native audio)	✓	Zero transcription overhead
Multi-agent handoff with context	✓	AgentHandoff + context objects
SIP inbound/outbound (PSTN)	✓	Twilio, Vonage, Bandwidth
Video + voice rooms (WebRTC)	✓	Unique to LiveKit
Browser SDK (JS/TS)	✓	React hooks included
Mobile SDK (Swift/Kotlin)	✓	iOS + Android native
DTMF / IVR menu handling	✓	SIP module built-in
End-to-end call recording	✓	Egress to S3/GCS
Self-hosted on your infra	✓	HIPAA / SOC 2 requirement

Sub-300ms

voice latency

standard pipeline

Sub-200ms

with Realtime API

no STT hop

10k+

min/month crossover

LiveKit vs managed

3 weeks

to production

from kick-off

What We Solve

5 LiveKit Pitfalls
We Fix Before They Hit Production

Problem

Worker cold starts

Impact

Unprimed LiveKit worker pools add 1–3 seconds to the first call connection. In a production contact centre, that opening silence kills the user experience.

Our Fix

We pre-warm worker pools based on historical call volume patterns, configure min_idle_worker thresholds, and implement health-check pings that keep workers hot. First-call latency matches steady-state latency.

Problem

Codec mismatch on SIP trunks

Impact

SIP trunks default to PCMU/PCMA (G.711). LiveKit Agents process Opus internally. Without explicit codec negotiation, audio quality degrades or calls fail entirely on certain carriers.

Our Fix

We configure explicit SDP codec preference in the SIP module, set up transcoding profiles for carrier-specific quirks, and validate against all target trunk providers before go-live.

Problem

Multi-agent context loss on handoff

Impact

When agent A hands off to agent B, the new agent starts with no conversational context — the caller has to repeat themselves, trust breaks down, and your escalation rate spikes.

Our Fix

We build structured context objects passed via AgentHandoff: compressed conversation summary, extracted entities (name, intent, key data), and caller sentiment. Agent B picks up mid-conversation without friction.

Problem

Room leak from unhandled disconnections

Impact

Rooms not explicitly closed after calls consume server resources indefinitely. At volume, this degrades your LiveKit server performance and inflates cloud costs.

Our Fix

We implement webhook handlers for participant_disconnected and room_finished events, enforce max room duration, and run a nightly room audit job that closes any rooms older than the maximum call duration.

Problem

OpenAI Realtime rate limit failures

Impact

GPT-4o Realtime has strict concurrency limits per API key. An unhandled 429 mid-call causes the agent to go silent — the worst possible UX in a voice interaction.

Our Fix

We implement exponential backoff, maintain a fallback pipeline (Deepgram STT + GPT-4o chat + TTS) that activates automatically if Realtime is rate-limited, and distribute load across multiple API keys for high-volume deployments.

Timeline

From Kick-Off to Production in 3 Weeks

Week 1

Architecture & Pipeline Design

›Map agent topology: single agent, multi-agent, or supervisor pattern
›Choose pipeline: VoicePipelineAgent vs OpenAI Realtime vs custom plugin
›Provider selection: STT, LLM, TTS benchmarked on your audio profile
›Spec SIP trunk requirements: carriers, codecs, inbound/outbound volume

Week 1–2

Infrastructure Setup

›LiveKit server deployment (Cloud or self-hosted on Kubernetes)
›SIP module configuration and trunk provider wiring
›Worker pool setup with pre-warming and autoscaling policies
›Room service API integration for server-side room management

Week 2–3

Agent Build & Integration

›Agent pipeline implementation with all conversation branches
›Multi-agent handoff logic with context passing
›CRM and calendar tool call integration
›WebRTC SDK integration (web/mobile if applicable)

Week 3

Tuning & Production

›Latency profiling: VAD, STT, LLM, TTS chain optimisation
›SIP codec validation across all target carriers
›Load testing: concurrent calls against target peak volume
›Monitoring, alerting, and runbook handoff

Ready to Build?

Ship Your LiveKit System
in 3 Weeks

We scope most LiveKit projects in a single 30-minute call. Fixed price, clear deliverables, production-ready in 3 weeks.

Book a Discovery Call Voice AI Services

LiveKit is open-source.Making it work at scale isn't.

Six things we'vebuilt on LiveKit

Multi-Agent Voice Pipelines

SIP Telephony at Scale

Video + Voice Rooms

OpenAI Realtime API Integration

Custom Agent Frameworks

Browser & Mobile WebRTC

OpenAI Realtime APIon LiveKit

The 10k Minute Crossover

Multi-Agent Handoffs

What's in the build

5 LiveKit PitfallsWe Fix Before They Hit Production

From Kick-Off to Production in 3 Weeks

Architecture & Pipeline Design

Infrastructure Setup

Agent Build & Integration

Tuning & Production

Ship Your LiveKit Systemin 3 Weeks

LiveKit is open-source.
Making it work at scale isn't.

Six things we've
built on LiveKit

OpenAI Realtime API
on LiveKit

5 LiveKit Pitfalls
We Fix Before They Hit Production

Ship Your LiveKit System
in 3 Weeks