Browser Agent SDK
Voice AI for the browser — from a single script tag to a fully custom React application.
Voice AI for the browser — from a single script tag to a fully custom React application.
The demo above requires a Deepgram account. Sign up free to try it, or keep reading to understand the architecture first.
Start from how much control you need, not which package to install.
“I want a voice agent on my site in five minutes.” Use the Widget. One install, one function call. Pick from six layouts — sidebar, floating, inline, button, embedded, or orb. No framework required.
“I’m building a React app and want pre-built components.” Use React UI Components. Conversation view, animated orb, mic/speaker buttons, text input, and a waveform visualizer — all styled through CSS custom properties that work alongside your existing design system.
“I’m building a React app but want full control over the UI.” Use React Hooks. Provider context and focused hooks for state, conversation history, microphone control, audio playback, and client-side function calling. You build every pixel; the hooks manage every connection.
“I’m using Vue, Svelte, Angular, or vanilla JS.”
Use the JavaScript SDK. The core AgentSession class is framework-agnostic. Pair it with AgentMicrophone for capture and AgentPlayer for playback. Wire the events into whatever UI layer you prefer.
Four packages, each building on the one below it. Install only the layer you need — everything above comes with it.
Every layer shares the same connection logic, audio pipeline, and event model. The difference is how much UI you want handled for you.
Each layer pulls in the layer below as a dependency, and re-exports the parts you need from above. Installing @deepgram/ui brings in @deepgram/react and @deepgram/agents automatically and re-exports the hooks, provider, and SDK types — you import everything you need from @deepgram/ui alone. The same pattern applies one layer down: @deepgram/react brings in @deepgram/agents and re-exports its types.
Welcome, ConversationText, AgentAudioDone, FunctionCallRequest, and more — has a typed event. Subscribe to exactly what you need.useAgentClientTool — they mount and unmount with the component. Dynamic tools are checked first, then the provider falls back to onFunctionCall.useAgentConversation hook accumulates transcript events into a structured message array with roles, content, and IDs.useAgentMode hook tracks whether the agent is idle, listening, or speaking — with the playback-aware delay described above.light-dark() adaptive defaults. Works with Tailwind, CSS Modules, plain CSS — anything that can set a custom property.data-dg-* attribute selectors instead of class names. Your existing CSS framework cannot collide with component styles.sidebar, floating, inline, button, embedded, orb. Each adapts to its container and responds to the host page’s color scheme.init() call returns a teardown function. Mount and unmount cleanly in single-page applications without leaking event listeners or audio contexts.init() or use CSS custom properties from the host page. The widget inherits light-dark() behavior automatically.Browser applications must never expose API keys to the client. Use a token factory — a function that returns a short-lived token from your server.
The browser WebSocket constructor does not support custom headers. The SDK works around this by passing the token as a Sec-WebSocket-Protocol value — the only header browsers allow on a WebSocket handshake. This is handled internally; you just return a token string from your factory function.
The token factory is called before every connection and reconnection attempt. Tokens stay fresh even across network interruptions.
The apiKey option exists for local development only. Never ship it in client-side code.
Configure the agent by referencing one you created in the Deepgram Console, or define the full configuration inline:
Agent IDs let you change behavior from the console without redeploying your application. Inline configuration gives you version control and the ability to construct prompts dynamically at runtime.
You can combine both approaches — reference an agent ID for base configuration and override specific settings inline. See the individual package guides for details.