We build production LiveKit systems — multi-agent voice pipelines, SIP telephony, WebRTC rooms, and OpenAI Realtime API integrations. The platform that beats every managed solution above 10k minutes/month.
Supervisor agents that route inbound calls to specialist sub-agents — triage → scheduling, intake → billing, receptionist → clinical team. Each agent has its own pipeline and prompt; LiveKit passes room context and transcript history between handoffs.
Healthcare, insurance, legal, financial services
Sub-300ms handoff latency between agents
Full PSTN coverage via LiveKit SIP module — inbound and outbound calling over Twilio, Vonage, or Bandwidth trunks. We configure codec negotiation, DTMF handling, call recording, and real-time transcription at the SIP layer.
Contact centres, outbound sales, healthcare scheduling
10k–200k+ minutes/month without per-minute platform fees
Combined video and voice experiences: telehealth consultations with AI scribing, live customer support with screen share, multi-participant rooms where an AI agent attends alongside human participants.
Telehealth, EdTech, customer success, remote support
WebRTC rooms with AI participants at <150ms A/V latency
GPT-4o Realtime runs directly inside a LiveKit Agents worker — native audio in/out, zero intermediate transcription step, ultra-low latency responses, and built-in voice activity detection. The fastest voice AI architecture available.
SaaS voice features, consumer apps, high-frequency call centres
First-word latency under 200ms with Realtime API
Beyond the default VoicePipelineAgent — we build custom plugin pipelines for non-standard audio processing, multi-modal agents that combine voice with vision (GPT-4o Vision), and worker pools for high-concurrency deployments.
Enterprise SaaS, security, logistics, field services
Horizontal scaling via LiveKit Workers + Kubernetes
LiveKit JS/Swift/Kotlin SDKs wired directly into your web app or mobile product — real-time audio rooms, AI participants, push-to-talk interfaces, and live transcription overlays without a third-party platform in the loop.
Consumer apps, internal tools, mobile-first products
Direct browser-to-agent WebRTC, no SIP hop needed
The fastest voice AI architecture available — GPT-4o native audio, zero intermediate transcription, sub-200ms first-word latency.
Standard Pipeline (Vapi / Retell)
Microphone audio
→ STT (Deepgram) — 60–120ms
→ LLM (GPT-4o) — 200–600ms
→ TTS (ElevenLabs) — 100–200ms
→ Speaker output
Total: 360ms–920ms per turn
Realtime API on LiveKit
Microphone audio
→ LiveKit room (WebRTC transport)
→ GPT-4o Realtime WebSocket
(audio in → audio out, no text hop)
→ LiveKit audio track → speaker
Total: 150ms–250ms per turn
Native voice activity detection
GPT-4o Realtime handles interruption natively — no separate VAD module, no endpointing tuning. The model knows when the user is done speaking.
No transcription latency tax
The standard STT → LLM → TTS chain adds 300–700ms per turn. Realtime eliminates the STT step entirely — audio goes straight to the model and comes back as audio.
LiveKit as the WebRTC transport
LiveKit handles room management, participant tracking, TURN/STUN, audio mixing, and recording. The Realtime API just handles AI processing. Best-of-both-worlds architecture.
Fallback pipeline included
We build a hot-standby STT → LLM → TTS pipeline that activates if Realtime API is rate-limited or degraded. Zero-downtime failover, invisible to the caller.
Below 10k minutes/month, Vapi BYOK is the right choice. Above it, LiveKit infrastructure costs collapse while managed platform fees keep compounding.
| Platform | Cost/min | 10k min/month | 50k min/month | Note |
|---|---|---|---|---|
| Managed (Vapi/Retell bundled) | $0.45–0.60/min | $4,500–6,000 | $22,500–30,000 | No infra control |
| Vapi BYOK | $0.23–0.33/min | $2,300–3,300 | $11,500–16,500 | Best-in-class managed |
| LiveKit Cloud + providers | $0.05–0.10/min | $500–1,000 | $2,500–5,000 | Our crossover recommendation |
| LiveKit self-hosted + providers | $0.02–0.04/min | $200–400 | $1,000–2,000 | Maximum cost control |
< 10k min/month
Use Vapi BYOK
LiveKit infra setup cost outweighs per-minute savings at low volume.
10k–50k min/month
LiveKit Cloud
Managed LiveKit eliminates infra ops while cutting costs 5–10×.
50k+ min/month
LiveKit self-hosted
At scale, self-hosted LiveKit on Kubernetes is $0.02–0.04/min total.
LiveKit is the only voice AI platform with a first-class multi-agent architecture. We design and build agent topologies that standard managed platforms cannot support.
Supervisor Pattern — Healthcare Example
Inbound call → Room created
→ Receptionist Agent joins
"How can I help you today?"
→ Caller: "I need to see Dr. Chen"
→ AgentHandoff triggered
context = { name, intent, caller_history }
→ Scheduling Agent joins room
Receptionist leaves gracefully
→ "I have you as [Name]..."
No repeat of earlier context
→ Appointment booked → room closed
Triage → Specialist
A triage agent collects the reason for contact and routes to the appropriate specialist agent (billing, scheduling, clinical, complaints). Each specialist has a tailored prompt and tool set.
Supervisor → Worker
A supervisor agent orchestrates multiple worker agents running in parallel — one handling transcription, one doing real-time CRM lookup, one managing the conversation. LangGraph-like coordination over voice.
Human-in-the-Loop
An AI agent handles the conversation until a human expert is needed. The human joins the LiveKit room (as a participant), reviews the conversation summary, and takes over seamlessly — or provides a whisper prompt the AI agent reads aloud.
| Capability | LiveKit | Implementation notes |
|---|---|---|
| VoicePipelineAgent (STT → LLM → TTS) | ✓ | Core pipeline, all providers |
| OpenAI Realtime API (native audio) | ✓ | Zero transcription overhead |
| Multi-agent handoff with context | ✓ | AgentHandoff + context objects |
| SIP inbound/outbound (PSTN) | ✓ | Twilio, Vonage, Bandwidth |
| Video + voice rooms (WebRTC) | ✓ | Unique to LiveKit |
| Browser SDK (JS/TS) | ✓ | React hooks included |
| Mobile SDK (Swift/Kotlin) | ✓ | iOS + Android native |
| DTMF / IVR menu handling | ✓ | SIP module built-in |
| End-to-end call recording | ✓ | Egress to S3/GCS |
| Self-hosted on your infra | ✓ | HIPAA / SOC 2 requirement |
Problem
Worker cold starts
Impact
Unprimed LiveKit worker pools add 1–3 seconds to the first call connection. In a production contact centre, that opening silence kills the user experience.
Our Fix
We pre-warm worker pools based on historical call volume patterns, configure min_idle_worker thresholds, and implement health-check pings that keep workers hot. First-call latency matches steady-state latency.
Problem
Codec mismatch on SIP trunks
Impact
SIP trunks default to PCMU/PCMA (G.711). LiveKit Agents process Opus internally. Without explicit codec negotiation, audio quality degrades or calls fail entirely on certain carriers.
Our Fix
We configure explicit SDP codec preference in the SIP module, set up transcoding profiles for carrier-specific quirks, and validate against all target trunk providers before go-live.
Problem
Multi-agent context loss on handoff
Impact
When agent A hands off to agent B, the new agent starts with no conversational context — the caller has to repeat themselves, trust breaks down, and your escalation rate spikes.
Our Fix
We build structured context objects passed via AgentHandoff: compressed conversation summary, extracted entities (name, intent, key data), and caller sentiment. Agent B picks up mid-conversation without friction.
Problem
Room leak from unhandled disconnections
Impact
Rooms not explicitly closed after calls consume server resources indefinitely. At volume, this degrades your LiveKit server performance and inflates cloud costs.
Our Fix
We implement webhook handlers for participant_disconnected and room_finished events, enforce max room duration, and run a nightly room audit job that closes any rooms older than the maximum call duration.
Problem
OpenAI Realtime rate limit failures
Impact
GPT-4o Realtime has strict concurrency limits per API key. An unhandled 429 mid-call causes the agent to go silent — the worst possible UX in a voice interaction.
Our Fix
We implement exponential backoff, maintain a fallback pipeline (Deepgram STT + GPT-4o chat + TTS) that activates automatically if Realtime is rate-limited, and distribute load across multiple API keys for high-volume deployments.
We scope most LiveKit projects in a single 30-minute call. Fixed price, clear deliverables, production-ready in 3 weeks.