Build powerful voice applications that work across web browsers, mobile apps, and backend systems. This guide covers both client-side voice interfaces and server-side call management using Vapi’s comprehensive SDK ecosystem.
In this quickstart, you’ll learn to:
Developing locally? The Vapi CLI makes it easy to initialize projects and test webhooks:
Best for: User-facing applications, voice widgets, mobile apps
Best for: Backend automation, bulk operations, system integrations
Build browser-based voice assistants and widgets for real-time user interaction.
Build browser-based voice interfaces:
For UIs that need to render live captions or karaoke-style word highlighting as the assistant speaks, subscribe to the opt-in assistant.speechStarted message. Add it to your assistant’s clientMessages:
Each event carries the full assistant turn text, the turn number, the source ("model", "force-say", or "custom-voice"), and optional timing data whose shape depends on your voice provider:
Cadence and granularity vary significantly by voice provider — pick the one that matches your UI requirements:
word-alignment) is the only provider that emits at true playback cadence with real per-word timestamps. Best for smooth karaoke-style highlighting with no client-side interpolation.word-progress) with subtitleType: "word" emits once per synthesis segment, near the end of that segment’s playback. The per-word timing.words[] array carries timestamps for the segment that just finished — useful for retroactive animation or forward extrapolation, but not for driving real-time highlighting during that segment. See the Minimax provider page for details.timing). One event per TTS chunk; you can interpolate a word cursor at a flat rate (~3.5 words/sec) between events for an approximate cursor.force-say events (your firstMessage, say actions) always emit as text-only, even on ElevenLabs and Minimax. On user barge-in, no further events fire for the interrupted turn — pair with the user-interrupted message to know what was actually spoken.
For the full event schema and field reference, see Server events → Assistant Speech Started.
Create a voice widget for your website:
The fastest way to get started. Copy this snippet into your website:
Automate outbound calls and handle inbound call processing with server-side SDKs.
Install the TypeScript Server SDK:
Run automated call campaigns for sales, surveys, or notifications:
Handle real-time events for both client and server applications:
Now that you understand both client and server SDK capabilities:
Client SDKs:
Server SDKs:
Documentation: