Open-source media infrastructure vs managed voice AI platform. Real per-minute cost breakdown and the volume crossover where LiveKit's economics pull decisively ahead.
We build production voice AI on both. This is our honest take.
TL;DR
Best for: engineering-led teams who need full control of the media stack, video + voice in one platform, volume above 50k min/month, self-hosted data sovereignty, or HIPAA on-prem deployments.
Best for: product teams that need a working voice agent in days, startups without dedicated infra engineers, and moderate-volume deployments under 50k min/month where managed simplicity outweighs higher per-minute cost.
Architecture
The core difference isn't features — it's ownership. Vapi is a managed black box that handles everything. LiveKit is an open WebRTC transport layer you assemble into a pipeline using their Agents SDK.
Your app calls the Vapi API. Vapi handles STT routing, LLM connection, TTS synthesis, call lifecycle, and webhooks. You configure via dashboard or API; you don't write any media code.
// your app calls
Your App ←→ Vapi API
↓ managed by Vapi
STT routing
LLM connection (BYOK)
TTS synthesis (BYOK)
Phone / WebRTC
Time to first working agent: 1–3 days
Your code uses the LiveKit Agents SDK (Python or Node) to assemble a pipeline — you choose and wire each plugin: STT, LLM, TTS. LiveKit handles WebRTC transport. You own every layer.
// your code assembles
Your Agent ←→ Agents SDK
↓ you wire each plugin
STT plugin (you choose)
LLM plugin (you choose)
TTS plugin (you choose)
LiveKit Server (open-source)
Time to first working agent: 1–3 weeks
The Real Pricing Math
LLM and TTS costs are identical on both — you bring your own keys. The difference is entirely in the platform fee. LiveKit Cloud trades integration time for a ~75% lower media cost.
| Cost component | Vapi | LiveKit Cloud |
|---|---|---|
Platform / media fee LiveKit Cloud charges per participant-minute (~$0.003 × 2) | $0.05 / min | ~$0.006 / min |
Speech-to-text (Deepgram BYOK) Identical — both use your own Deepgram key | ~$0.008 / min | ~$0.008 / min |
LLM (GPT-4o mini, BYOK) Same cost on both, depends on turn length | ~$0.02–0.05 / min | ~$0.02–0.05 / min |
TTS (ElevenLabs Flash, BYOK) Identical — you bring your own ElevenLabs key | ~$0.10–0.15 / min | ~$0.10–0.15 / min |
Telephony (Twilio) Same cost if using Twilio for PSTN on both platforms | ~$0.009 / min | ~$0.009 / min |
Effective platform + STT LiveKit is ~75% cheaper on infrastructure alone | ~$0.058 / min | ~$0.014 / min |
Typical all-in total LiveKit saves ~$0.05/min on same BYOK stack | $0.23–0.33 / min | $0.18–0.27 / min |
LiveKit Cloud pricing based on published participant-minute rates. Self-hosted LiveKit brings media cost near zero beyond infra. BYOK costs assume GPT-4o mini, Deepgram Nova-2, ElevenLabs Flash Turbo v2.
Cost Crossover
LiveKit Cloud is cheaper per-minute from day one. But integrating the Agents SDK takes 2–4 engineer-weeks of setup. The question is when that investment pays off. The answer is ~10k–50k minutes/month sustained, depending on your engineering cost.
10,000 min / mo
LiveKit cheaper
Savings are real but integration cost (~$12k–20k eng) takes 24+ months to recoup
50,000 min / mo
Crossover point
Integration investment recovers in 5–8 months — this is where LiveKit wins on TCO
100,000 min / mo
LiveKit decisively cheaper
Self-hosted option cuts media cost to near-zero — the gap widens further
Feature Comparison
| Feature | Vapi | LiveKit |
|---|---|---|
Voice AI agents Vapi via managed pipeline; LiveKit via Agents SDK — full code, full control | ✓ | ✓ |
Video + voice in one platform LiveKit handles real-time video and voice natively; Vapi is voice-only | ✗ | ✓ |
Open-source / self-hostable LiveKit server is Apache 2.0 — run on your own cloud or on-prem | ✗ | ✓ |
Managed cloud option Vapi is cloud-only; LiveKit offers both LiveKit Cloud and self-hosted | ✓ | ✓ |
Pre-built AI voice pipeline Vapi bundles STT → LLM → TTS orchestration; LiveKit requires code assembly | ✓ | ✗ |
Full WebRTC stack control LiveKit exposes the full media layer; Vapi abstracts WebRTC entirely | ✗ | ✓ |
SIP / PSTN support Both support SIP trunking; LiveKit has deeper native WebRTC↔SIP bridging | ✓ | ✓ |
BYOK LLM Both support GPT-4o, Claude, Gemini, and custom model endpoints | ✓ | ✓ |
BYOK STT / TTS Both support Deepgram, ElevenLabs, Cartesia, Azure, and others | ✓ | ✓ |
Multi-participant rooms LiveKit supports multi-user rooms natively; Vapi handles one caller per call | ✗ | ✓ |
Built-in analytics dashboard Vapi has native call dashboards; LiveKit requires custom observability setup | ✓ | ✗ |
HIPAA self-hosted control Self-hosted LiveKit gives full PHI control; Vapi HIPAA eligibility unconfirmed | ✗ | ✓ |
Scaling Economics
Vapi charges per minute regardless of volume. No servers to manage, no scaling events — but your cost grows exactly with usage. At $0.05/min platform fee, every 100k additional minutes adds $5,000 to your monthly infrastructure bill with no compression possible.
Self-hosted LiveKit runs on a fixed server. A $200/month instance handles millions of participant-minutes. As volume grows, per-minute media cost collapses toward zero. At 500k min/month self-hosted, your effective media cost is ~$0.0004/min — over 100× cheaper than Vapi's platform fee alone.
The Decision Tree
This is the actual decision flow we walk clients through. Answer yes to the first question that applies.
Do you need video + voice in the same platform?
Yes
Use LiveKit
Only option here — Vapi is strictly voice-only
No
Continue ↓
Building a prototype or MVP in under 4 weeks?
Yes
Use Vapi
Pre-built pipeline — working agents in days, not weeks
No
Continue ↓
Does your team have engineers to own the media pipeline?
Yes
Continue ↓
No
Use Vapi
LiveKit Agents SDK requires dedicated integration work
Volume above 50k minutes / month (or heading there fast)?
Yes
Use LiveKit
Economics shift decisively — integration cost recovers in months
No
Continue ↓
Need full data sovereignty or HIPAA self-hosted compliance?
Yes
Use LiveKit
Self-host for complete PHI control and infra ownership
No
Continue ↓
Final default
When in doubt, start with Vapi.
Ship a working agent first. When your monthly bill crosses $3,000–5,000 and you have engineering bandwidth, that's when a LiveKit migration pays for itself.
Our Role
Hestur AI is a platform-agnostic voice AI engineering firm. We don't have partner agreements with Vapi or LiveKit. We recommend the platform that fits your stage and scale, then build on it.
Typically early-stage and product-led companies that need a working agent quickly. We build on Vapi when speed-to-market matters more than per-minute cost — usually under 50k min/month or when the product is still validating PMF.
Common scope: inbound support agents, sales qualification, appointment booking, customer service deflection.
Typically engineering-led companies at growth stage with significant voice volume, HIPAA requirements, or multi-modal (video + voice) needs. We architect the Agents SDK pipeline, set up cloud or self-hosted infrastructure, and implement observability.
Common scope: high-volume outbound calling, telehealth platforms, video + voice agents, on-prem enterprise deployments.
Book a 30-minute call. We'll review your use case, volume projections, and engineering resources — and give you a clear platform recommendation with no pitch attached.