Voice AI12 min read

    AI Voice Agent Setup Guide for Service Businesses

    A step-by-step playbook for restaurants, clinics, gyms, and service providers deploying their first production voice AI agent.

    H
    Hestur AI
    hestur.co
    2–3 wks
    Typical Deploy
    from kick-off to live calls
    55–75%
    Call Containment
    without human agent
    $0.23–0.33
    Cost All-In
    per minute (BYOK)
    <400ms
    Response Time
    avg first-word latency

    Who This Guide Is For

    This playbook targets service businesses receiving 50+ inbound calls per week: medical clinics, dental practices, physio studios, gyms, restaurants, home service providers, and salons. If your staff spends more than 2 hours per day answering calls that follow a predictable script (bookings, FAQs, directions, hours), a voice AI agent will pay for itself within 60–90 days.

    Scope check: Voice AI works well for calls with a defined goal (book, cancel, answer a question, transfer). It does not replace complex relationship calls where context, empathy, and judgment are the product.

    Step 1 — Choose Your Platform

    PlatformBest ForCost/Min (BYOK)Time to Deploy
    VapiFast prototypes, API-first teams$0.05 + provider1–2 weeks
    LiveKit AgentsScale 10k+ min/month, custom infraInfra cost only3–4 weeks
    Retell AINon-technical teams, visual builder$0.07 + provider1 week
    Our recommendation for service businesses: Start with Vapi. It has the fastest onboarding, solid BYOK support, and the best webhook ecosystem for CRM write-back. Migrate to LiveKit when your monthly bill exceeds $3–5K.

    Step 2 — BYOK Provider Setup

    Bring Your Own Keys dramatically cuts per-minute cost. Set up accounts with each provider before building your agent:

    LayerProviderCostNotes
    STT (Speech-to-Text)Deepgram Nova-2$0.006/minBest accuracy for phone audio
    LLM (Brain)GPT-4o mini or Claude Haiku$0.05–0.10/minHaiku is faster; GPT-4o mini is cheaper
    TTS (Voice)ElevenLabs or PlayHT$0.02–0.04/minElevenLabs has more natural pauses
    PlatformVapi$0.05/minOrchestration only, not AI processing

    Total BYOK: $0.13–0.21/min vs $0.45–0.60/min bundled — saves $15K–$30K/year at 10K min/month.

    Step 3 — Write Your System Prompt

    The system prompt is the agent's brain. Most voice agent failures are prompt failures, not infrastructure failures.

    1
    Define the role and persona
    Give the agent a name, role, and business context. "You are Aria, the front desk AI for Oakwood Dental. You help patients schedule appointments, answer clinic questions, and handle appointment changes."
    2
    List what it CAN do
    Explicit capability list prevents hallucination. Include: book appointments, check availability, confirm/cancel bookings, provide directions, answer FAQ, transfer to staff for complex requests.
    3
    Define hard stops
    What the agent must never do: give medical/legal advice, quote prices it doesn't have, confirm appointments it can't verify. On uncertainty: always transfer, never guess.
    4
    Set the voice style
    Service business tone: warm, brief, efficient. "Keep responses under 2 sentences. Do not over-explain. Mirror the patient's energy level — slower with elderly callers, brisk with busy professionals."
    5
    Add a fallback chain
    Define the escalation path explicitly. "If you cannot resolve the caller's request in 2 attempts, say: 'Let me get someone who can help you better' and transfer to the front desk."

    Step 4 — Connect Your Calendar

    • Create a dedicated service account with calendar write access (do not use a personal account)
    • For Google Calendar: enable the Calendar API in Google Cloud Console, create a service account, share your booking calendar with it
    • For Acuity / Calendly: generate an API key from account settings
    • For Jane App (clinics): use the Jane API — requires the clinic plan
    • Build a Vapi tool call: GET /availability → shows open slots, POST /appointments → creates booking
    • Test with 5 real booking scenarios before going live

    Step 5 — SIP / Phone Number Setup

    Connect the agent to a real phone number via Twilio or Vonage:

    1
    Buy a local number
    Use Twilio or Vonage. Local numbers get higher pickup rates than toll-free. $1–2/month per number.
    2
    Configure call forwarding
    Option A: Point the new number directly to Vapi (use as primary number). Option B: Forward your existing number to Vapi during business hours, ring to staff after hours.
    3
    Test voicemail detection
    Vapi has built-in voicemail detection. Tune the `vmDetection` threshold — default is often too aggressive for mobile numbers with custom greetings.

    Step 6 — Latency Tuning

    Target: first word from agent under 800ms. Over 1s feels unnatural on phone calls.

    IssueCauseFix
    Slow first responseLLM cold startUse streaming + smaller model for opening line
    Unnatural pauses mid-sentenceTTS chunkingEnable sentence-level streaming in ElevenLabs
    Agent interrupts callerLow endpointing thresholdIncrease VAD sensitivity, add 300ms silence buffer
    Caller interrupts agentNo barge-in handlingEnable interruption in Vapi, shorten agent responses

    Step 7 — CRM Write-Back

    After every call, push structured data to your CRM via Vapi webhooks:

    • call_ended webhook → parse transcript for intent, entities (name, phone, request type)
    • Write contact + call summary to CRM (Salesforce, HubSpot, or your EHR)
    • Flag calls where agent could not resolve for human review queue
    • Log call duration, containment result, and transfer reason for weekly reporting

    2-Week Deployment Timeline

    PhaseDaysWhat Happens
    Architecture1–3Provider setup, BYOK keys, system prompt v1, tool call specs
    Integration4–8Calendar API, CRM webhook, phone number, voicemail detection
    Tuning9–11Latency profiling, interrupt handling, edge case prompting
    Validation12–14Shadow mode alongside live calls, staff review, go-live decision

    What to Measure in Week 1

    Target 55%+
    Containment Rate
    calls resolved without transfer
    Target <3 min
    Avg Handle Time
    vs 6–8 min with human
    Target <30%
    Transfer Rate
    escalations to staff
    Target 4.0+
    CSAT
    post-call SMS survey
    Common mistake: Going live on your main number on day one. Always shadow-mode for at least 5 days — the agent listens on real calls but staff still answers. Catch edge cases before they hit real customers.
    Want this implemented for your business?
    We scope most projects in 48 hours. Fixed price, 2–4 weeks to deploy.
    Book a Discovery Call