An AI voice agent costs $0.07–$0.33 per minute in infrastructure, plus a one-time build cost of $5,000–$100,000 depending on complexity. The per-minute rate is determined by four stacked costs: speech-to-text, LLM inference, text-to-speech, and telephony — all on top of any platform fee.
The per-minute cost breakdown
Every voice call stacks four costs:
| Component | Low | High | Notes |
|—|—|—|—|
| Speech-to-text | $0.006 | $0.025 | Deepgram cheapest; Whisper most accurate |
| LLM inference | $0.03 | $0.12 | GPT-4o mini cheap; Claude Sonnet premium |
| Text-to-speech | $0.015 | $0.08 | Cartesia cheapest; ElevenLabs best quality |
| Telephony | $0.007 | $0.015 | SIP trunk + carrier fees |
| Platform fee | $0.02 | $0.07 | Vapi, Retell, LiveKit platform charges |
| Total | $0.07 | $0.33 | Typical BYOK range |
BYOK (bring your own key) means you supply your own API keys for STT, LLM, and TTS — paying those providers directly rather than through the platform at marked-up rates. BYOK typically saves 30–40% compared to bundled platform pricing.
Per-platform cost comparison
| Platform | Setup | BYOK total/min | Bundled total/min |
|—|—|—|—|
| Vapi | Managed | $0.23–0.33 | $0.40–0.55 |
| Retell AI | Managed | $0.21–0.30 | $0.38–0.50 |
| LiveKit Cloud | Managed | $0.15–0.20 | n/a |
| LiveKit self-hosted | Self-managed | $0.07–0.15 | n/a |
Monthly run-rate at different volumes
| Volume | Vapi BYOK | LiveKit Cloud | Self-hosted |
|—|—|—|—|
| 1,000 min/mo | $230–330 | $150–200 | $70–150 |
| 5,000 min/mo | $1,150–1,650 | $750–1,000 | $350–750 |
| 10,000 min/mo | $2,300–3,300 | $1,500–2,000 | $700–1,500 |
| 25,000 min/mo | $5,750–8,250 | $3,750–5,000 | $1,750–3,750 |
The crossover point where self-hosting becomes worthwhile is ~10,000 minutes/month. Below that, the engineering overhead of managing infrastructure exceeds the savings.
Build cost (one-time)
| Tier | Range | Timeline |
|—|—|—|
| Proof of Concept | $5,000–$15,000 | 2–4 weeks |
| Production build | $25,000–$75,000 | 6–12 weeks |
| Enterprise | $100,000+ | 12–24 weeks |
Hidden costs most estimates miss
Telephony infrastructure. SIP trunks, DID (phone) numbers, and PSTN carrier fees add $70–150/month at 10,000 minutes. Frequently missing from initial vendor quotes.
Ongoing prompt tuning. Voice AI needs 4–8 hours/month of engineering attention as your business changes. Edge cases emerge. The system prompt needs updating. If billed at engineer rates, this adds $500–2,000/month.
Monitoring. You need call recording storage, transcript search, and failed-call alerting. Not included in most per-minute rates. Budget $100–400/month.