Fieldtested
AI AGENT REVIEW

Vapi review

Published May 29, 2026

Vapi icon

Vapi

Voice $0/usage · Free trial

The developer's voice platform — pick Vapi when your team writes code and wants line-item control over the speech stack.

Vapi homepage screenshot

OVERALL SCORE

7.7

out of 10

Features 8.0/10
Value 8.2/10
UX 7.0/10
Data quality 7.5/10
Visit Vapi

External link · opens in new tab

TL;DR

Vapi is the voice agent platform built for developers. Where Retell wins on polished call-flow UIs and Bland targets outbound dialer scale, Vapi optimizes for composability and code-first ergonomics. If your team writes code and you want explicit control over each leg of the speech stack (STT, LLM, TTS, telephony), this is the right default. If you want a turnkey dashboard, look elsewhere.

Who it’s for

Vapi fits engineering-led teams building voice agents into existing software products: B2B SaaS adding voice channels, telephony-heavy startups, contact-center modernization projects, and developer tools that need a voice interface. The platform underperforms expectations when the buyer is non-technical or when the workflow needs to live entirely inside a no-code dashboard.

The non-obvious fit: agencies and integrators building voice agents for multiple clients. Vapi’s composability lets you tune cost, latency, and quality per client without locking in to a single vendor’s house stack. The same agent template ports cleanly across deployments by swapping config.

At a glance

  • Pricing: Platform fee ~$0.05/voice-minute plus pass-through STT/LLM/TTS/telephony
  • Billing: Usage-based, monthly invoicing
  • Free trial: Free credits on signup, no card required
  • Telephony: Twilio, Telnyx, Vonage, or direct SIP
  • STT providers: Deepgram, Whisper, AssemblyAI, Gladia
  • TTS providers: ElevenLabs, Cartesia, PlayHT, Deepgram Aura
  • LLM providers: OpenAI, Anthropic, Google, Groq, open-weight via custom endpoints

Features deep-dive

Composable speech stack. Vapi’s defining product decision: every leg of the voice pipeline is swappable. You pick STT provider X, LLM provider Y, TTS provider Z; Vapi orchestrates the streaming. This means you can chase the latest model on each axis without re-platforming. The trade-off is that you own the tuning decisions — there is no “house default that works.”

SDKs and APIs. First-party TypeScript and Python SDKs, REST and WebSocket APIs, webhook-driven event flow. The DX is closer to Stripe than to most voice-agent dashboards: you provision agents, place calls, and stream events all from code. The dashboard exists but it’s a thin wrapper over the API.

Squad calls. Multiple agents can hand off within a single live call — e.g., an intake agent qualifies the caller, then transfers to a specialist agent with the full conversation context. Handoffs are first-class with state passing; this is harder to build cleanly than it looks.

Server-side functions. Custom function calls during live calls go through your endpoints, not a no-code action library. You write the function, register it with the agent, and Vapi invokes it mid-call. Latency optimization is your responsibility but flexibility is full.

Pricing analysis

The Vapi pricing model is the clearest in the voice category: $0.05/minute platform fee, then explicit pass-through for STT, LLM tokens, TTS, and telephony. A 5-minute inbound call with Deepgram + Claude + ElevenLabs + Twilio lands roughly at $0.40-0.70 platform-side, plus telephony minutes.

For high-volume deployments, the line-item visibility lets you optimize aggressively. Switching from ElevenLabs ($0.18/1k chars) to Cartesia (~$0.05/1k chars) on a high-volume voice deployment can cut TTS costs 60% with minimal quality difference for many voices. The platform encourages this optimization; competitors hide the math.

Compared to Retell’s bundled pricing: Vapi typically lands lower at scale and higher on small volumes (the platform fee dominates short calls). Compared to Bland: Bland targets outbound dialer economics with volume discounts Vapi doesn’t match.

Strengths

Developer ergonomics. The SDKs are good, the API is clean, the docs are accurate, the changelog is current. Building a working voice agent on Vapi is faster than building one on Retell for engineers — though slower for non-engineers. The framework respects developer time.

The composability pays off when models improve. When a new TTS provider ships better voices, you swap one line of config; you don’t wait for the platform to integrate. The half-life of “best STT model” or “best TTS voice” is six months in 2026; Vapi’s architecture keeps pace.

Weaknesses

The dashboard is functional, not delightful. Ops teams used to no-code builders will find it spartan. Most production Vapi deployments don’t use the dashboard meaningfully — they configure agents in code — but if your team needs UI workflows, this is friction.

Composability creates configuration burden. Choosing the wrong STT model on a noisy phone line, or pairing a slow LLM with a fast TTS, produces calls that feel broken in non-obvious ways. The platform doesn’t enforce sensible defaults; you’re expected to test and tune. This is a strength for builders and a weakness for everyone else.

Verdict

Vapi is the right voice platform for engineering-led teams in 2026. The composability is a genuine architectural advantage, the SDKs are well-built, and the pricing is the most transparent in the category. The dashboard limitations and tuning burden are real but match the target audience. For ops-first teams, Retell remains the cleaner choice; for builder-first teams, Vapi wins. See FAQ below.

FAQ

  1. How does Vapi differ from Retell and Bland? +

    Vapi is the most developer-oriented of the three. Retell has the most polished call-flow UI and best out-of-the-box latency. Bland targets outbound dialing scale. Vapi wins on composability and code-first ergonomics. The choice often follows team culture: builder-first → Vapi, ops-first → Retell, dialer-scale → Bland.

  2. Can I mix providers — e.g., Deepgram for STT, Anthropic for LLM, ElevenLabs for TTS? +

    Yes. That composability is Vapi's defining product decision. You configure each leg of the stack independently and Vapi handles the streaming pipeline between them. Common in production: Deepgram Nova → Claude or GPT-4 → ElevenLabs or Cartesia.

  3. What does Vapi cost in practice? +

    Vapi charges a platform fee per minute (~$0.05) on top of pass-through costs for STT, LLM tokens, TTS, and telephony. A typical 5-minute call lands at $0.40-1.20 all-in depending on stack choices. Forecasting is easier than competitors because every cost line is visible.

  4. Is Vapi production-ready for regulated industries? +

    Reasonable yes. Vapi ships call recording, transcript storage, and webhook-based integration with your compliance tooling. HIPAA-compliant configurations are possible with the right vendor selections on the underlying stack. As always: legal review beats vendor marketing.

VAPI HEAD-TO-HEAD
Stéphane Viaud-Murat

Stéphane Viaud-Murat

CEO, mi4.fr