Stagehand is an AI-driven browser automation framework built and maintained by Browserbase. It combines natural language instructions with traditional programmatic code for web automation. Unlike purely low-level tools (Selenium, Playwright, Puppeteer) that require brittle selectors and explicit DOM paths, and unlike purely AI-driven agents that can be unpredictable in production, Stagehand allows developers to choose when to use natural language versus deterministic code, and automatically converts exploratory AI workflows into cached, deterministic scripts.
Stagehand is a TypeScript/JavaScript framework built on Chrome DevTools Protocol (CDP) that provides AI-powered browser automation through the V3 orchestrator class. The framework's core innovation is its intelligent caching system that remembers AI-driven actions and replays them deterministically on subsequent runs, eliminating LLM inference costs and latency while maintaining self-healing capabilities when websites change.
The V3 class (packages/core/lib/v3/V3.ts) provides four primary operations:
act(instruction): Execute browser actions using natural language (e.g., "click on the login button")extract(instruction, schema): Extract structured data with Zod schema validationobserve(instruction): Discover interactive elements and available actions on the pageagent(config).execute(instruction): Execute multi-step autonomous workflows with DOM-based, Hybrid, or Computer Use Agent (CUA) modesThe framework is distributed as a pnpm/Turborepo monorepo with four main packages:
| Package | npm name | Purpose | Key Components |
|---|---|---|---|
| packages/core | @browserbasehq/stagehand | Core automation library | V3 orchestrator, handlers, LLM provider, CDP connection management |
| packages/evals | @browserbasehq/stagehand-evals (private) | Evaluation and benchmarking | Braintrust integration, model comparison, quality gates |
| packages/server | @browserbasehq/stagehand-server (private) | REST API server | Fastify routes, OpenAPI spec generation, multi-region support |
| packages/docs | — | Documentation site | Mintlify-based docs, API reference |
See page 1.1 for a detailed breakdown of the monorepo layout.
Sources: README.md58-70 packages/core/package.json1-10 packages/server/package.json1-10 packages/evals/package.json1-10
| Feature | Description | Code Entry Point |
|---|---|---|
| Hybrid Control | Mix natural language and programmatic code in the same workflow | V3 methods + Page API (packages/core/lib/v3/browser/Page.ts) |
| Intelligent Caching | Cache AI-driven workflows so subsequent runs require zero LLM calls | ActCache (packages/core/lib/v3/cache/ActCache.ts), AgentCache (packages/core/lib/v3/cache/AgentCache.ts) |
| Self-Healing | Re-infer actions when cached selectors no longer match page state | ActCache.tryReplay(), ActHandler self-heal loop |
| Multi-Mode Agents | DOM (tool-based), Hybrid (DOM + coordinates), CUA (screenshot-based) | V3AgentHandler (packages/core/lib/v3/agent/V3AgentHandler.ts), V3CuaAgentHandler (packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts) |
| Multi-Provider LLM | Unified interface for OpenAI, Anthropic, Google, Cerebras, Groq, Azure, and any Vercel AI SDK provider | LLMProvider (packages/core/lib/v3/llm/LLMProvider.ts), AISdkClient (packages/core/lib/v3/llm/aisdk.ts) |
| CDP-Based Browser | Custom Chrome DevTools Protocol stack replaces Playwright internals for performance | V3Context (packages/core/lib/v3/browser/V3Context.ts), CdpConnection (packages/core/lib/v3/browser/cdp/CdpConnection.ts) |
| Local & Cloud | Connect to a local Chrome instance or a remote Browserbase session | env: "LOCAL" | "BROWSERBASE" in V3Options |
Sources: README.md62-70 packages/core/lib/v3/V3.ts packages/core/lib/v3/cache/ActCache.ts packages/core/lib/v3/browser/V3Context.ts
Sources: README.md82-104 packages/core/README.md82-104
The following diagrams show how all major components relate and how requests flow through Stagehand.
Diagram: High-Level Component Map
Diagram: act() Request Flow (Cache Hit vs Cache Miss)
Key Components:
| Component | File | Role |
|---|---|---|
V3 | packages/core/lib/v3/V3.ts | Main orchestrator; manages browser lifecycle and delegates to handlers |
ActHandler | packages/core/lib/v3/handlers/ActHandler.ts | Executes natural language browser actions |
ExtractHandler | packages/core/lib/v3/handlers/ExtractHandler.ts | Extracts structured data using Zod schemas |
ObserveHandler | packages/core/lib/v3/handlers/ObserveHandler.ts | Discovers interactive elements |
V3AgentHandler | packages/core/lib/v3/agent/V3AgentHandler.ts | DOM/Hybrid multi-step agent loop |
V3CuaAgentHandler | packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts | Screenshot-based CUA agent loop |
ActCache | packages/core/lib/v3/cache/ActCache.ts | Caches act() results for deterministic replay |
AgentCache | packages/core/lib/v3/cache/AgentCache.ts | Caches agent step sequences |
LLMProvider | packages/core/lib/v3/llm/LLMProvider.ts | Factory that resolves the correct LLMClient for a given model string |
AISdkClient | packages/core/lib/v3/llm/aisdk.ts | Wraps Vercel AI SDK (generateObject, generateText) |
V3Context | packages/core/lib/v3/browser/V3Context.ts | Multi-page browser context manager |
Page | packages/core/lib/v3/browser/Page.ts | Per-page API: navigation, evaluation, screenshots, snapshots |
Frame | packages/core/lib/v3/browser/Frame.ts | Per-frame accessibility tree and element traversal |
CdpConnection | packages/core/lib/v3/browser/cdp/CdpConnection.ts | WebSocket multiplexer for Chrome DevTools Protocol |
Sources: packages/core/lib/v3/V3.ts packages/core/lib/v3/cache/ActCache.ts packages/core/lib/v3/handlers/ActHandler.ts packages/core/lib/v3/llm/LLMProvider.ts packages/core/lib/v3/browser/V3Context.ts packages/core/lib/v3/browser/cdp/CdpConnection.ts
The repository is a pnpm workspace with Turborepo coordinating parallel builds. See page 1.1 for the full layout.
Diagram: Package Dependency Graph
| Package | npm name | Published | Key external deps |
|---|---|---|---|
| packages/core | @browserbasehq/stagehand | ✅ yes | ai, @ai-sdk/*, @anthropic-ai/sdk, @google/genai, ws, devtools-protocol |
| packages/server | private | ❌ no | fastify, fastify-zod-openapi, @browserbasehq/stagehand |
| packages/evals | private | ❌ no | braintrust, @browserbasehq/stagehand |
| packages/docs | — | — | mintlify |
Sources: package.json1-51 packages/core/package.json60-110 packages/server/package.json23-40 packages/evals/package.json18-26
The V3 class (packages/core/lib/v3/V3.ts) exposes four primary operations. Each is handled by a dedicated class:
| Operation | V3 method | Handler class | Cached? | Description |
|---|---|---|---|---|
| act | V3.act(instruction, options?) | ActHandler (packages/core/lib/v3/handlers/ActHandler.ts) | ✅ ActCache | Executes a single browser action via natural language |
| extract | V3.extract(instruction, schema) | ExtractHandler (packages/core/lib/v3/handlers/ExtractHandler.ts) | ❌ | Extracts structured data; validates output against a Zod schema |
| observe | V3.observe(instruction?, options?) | ObserveHandler (packages/core/lib/v3/handlers/ObserveHandler.ts) | ❌ | Discovers interactive elements; returns selectors and suggested actions |
| agent | V3.agent(config).execute(instruction) | V3AgentHandler / V3CuaAgentHandler (packages/core/lib/v3/agent/) | ✅ AgentCache | Runs a multi-step autonomous workflow |
V3.resolvePage() picks the target Page instanceActCache.tryReplay() or AgentCache.tryReplay() attempts deterministic replayPage.snapshot() to get a hybrid DOM + accessibility treeLLMProvider.getClient().createChatCompletion() runs model inferencePage and CdpConnectionV3.history for debugging and auditThe agent() primitive supports three AgentToolMode values:
| Mode | Handler | Mechanism |
|---|---|---|
dom | V3AgentHandler | Tool calls: act, extract, observe, goto |
hybrid | V3AgentHandler | DOM tools + coordinate-based click, type, scroll, drag |
cua | V3CuaAgentHandler | Screenshot-based; uses provider CUA APIs (Google, Anthropic, OpenAI, Microsoft) |
See pages 4 (operations) and 5 (agent system) for full details.
Sources: packages/core/lib/v3/V3.ts packages/core/lib/v3/handlers/ActHandler.ts packages/core/lib/v3/handlers/ExtractHandler.ts packages/core/lib/v3/handlers/ObserveHandler.ts packages/core/lib/v3/agent/V3AgentHandler.ts packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts
LLMProvider (packages/core/lib/v3/llm/LLMProvider.ts) is a factory that maps a model string like "openai/gpt-4o" or "anthropic/claude-3-5-sonnet-latest" to a concrete LLMClient. The primary implementation is AISdkClient (packages/core/lib/v3/llm/aisdk.ts), which wraps the Vercel AI SDK's generateObject and generateText. API keys are loaded from environment variables using provider name prefixes (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY).
See page 7 for full LLM integration details including all supported providers and model configuration options.
Stagehand uses a custom CDP stack rather than a full Playwright dependency at runtime. CdpConnection (packages/core/lib/v3/browser/cdp/CdpConnection.ts) manages a WebSocket to Chrome. V3Context (packages/core/lib/v3/browser/V3Context.ts) tracks multiple pages, and Page + Frame wrap per-page and per-frame operations.
Browser engines (Playwright, Puppeteer, Patchright) are listed as optional peer dependencies and are only used for local browser launching, not for the automation layer itself.
See page 3.2 for the full CDP and browser connection details.
When env: "BROWSERBASE" is set in V3Options, Stagehand connects to a remote Browserbase session using @browserbasehq/sdk. Authentication requires BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID. Browserbase sessions support debugging URLs and persistent state.
See page 12.4 for Browserbase-specific features.
Sources: packages/core/lib/v3/llm/LLMProvider.ts packages/core/lib/v3/llm/aisdk.ts packages/core/lib/v3/browser/cdp/CdpConnection.ts packages/core/lib/v3/browser/V3Context.ts packages/core/package.json86-108 .env.example1-12
The @browserbasehq/stagehand-evals package provides a Braintrust-integrated evaluation framework. Evaluations are organized into categories controlled by the EVAL_CATEGORIES environment variable: act, extract, observe, combination, agent, and experimental.
See page 9 for the full evaluation system details.
Sources: packages/evals/package.json1-33 .env.example9-12
The @browserbasehq/stagehand-server package (packages/server) wraps the core library in a Fastify REST API. Sessions are started with POST /v1/sessions/start, and operations (act, extract, observe, agentExecute) are called as POST /v1/sessions/:id/<operation>. Zod schemas serve as the source of truth for the OpenAPI spec, which is generated by packages/server/scripts/gen-openapi.ts
See page 8 for the full server and API documentation.
Sources: packages/server/package.json1-40 packages/server/src/server.ts
Stagehand is designed for production browser automation scenarios where traditional scripting is too brittle and pure AI agents are too unpredictable:
extract() with Zod schemasact()agent()The caching system enables these workflows to run deterministically after initial AI-guided execution, reducing cost and latency for repeated operations.
Sources: README.md58-70 packages/docs/v2/best-practices/contributing.mdx1-53
Refresh this wiki
This wiki was recently refreshed. Please wait 5 days to refresh again.