Introduction to Stagehand

Relevant source files

Stagehand is an AI-driven browser automation framework built and maintained by Browserbase. It combines natural language instructions with traditional programmatic code for web automation. Unlike purely low-level tools (Selenium, Playwright, Puppeteer) that require brittle selectors and explicit DOM paths, and unlike purely AI-driven agents that can be unpredictable in production, Stagehand allows developers to choose when to use natural language versus deterministic code, and automatically converts exploratory AI workflows into cached, deterministic scripts.

What is Stagehand?

Stagehand is a TypeScript/JavaScript framework built on Chrome DevTools Protocol (CDP) that provides AI-powered browser automation through the V3 orchestrator class. The framework's core innovation is its intelligent caching system that remembers AI-driven actions and replays them deterministically on subsequent runs, eliminating LLM inference costs and latency while maintaining self-healing capabilities when websites change.

The V3 class (packages/core/lib/v3/V3.ts) provides four primary operations:

act(instruction): Execute browser actions using natural language (e.g., "click on the login button")
extract(instruction, schema): Extract structured data with Zod schema validation
observe(instruction): Discover interactive elements and available actions on the page
agent(config).execute(instruction): Execute multi-step autonomous workflows with DOM-based, Hybrid, or Computer Use Agent (CUA) modes

The framework is distributed as a pnpm/Turborepo monorepo with four main packages:

Package	npm name	Purpose	Key Components
packages/core	`@browserbasehq/stagehand`	Core automation library	`V3` orchestrator, handlers, LLM provider, CDP connection management
packages/evals	`@browserbasehq/stagehand-evals` (private)	Evaluation and benchmarking	Braintrust integration, model comparison, quality gates
packages/server	`@browserbasehq/stagehand-server` (private)	REST API server	Fastify routes, OpenAPI spec generation, multi-region support
packages/docs	—	Documentation site	Mintlify-based docs, API reference

See page 1.1 for a detailed breakdown of the monorepo layout.

Sources: README.md58-70 packages/core/package.json1-10 packages/server/package.json1-10 packages/evals/package.json1-10

Key Features

Feature	Description	Code Entry Point
Hybrid Control	Mix natural language and programmatic code in the same workflow	`V3` methods + `Page` API (packages/core/lib/v3/browser/Page.ts)
Intelligent Caching	Cache AI-driven workflows so subsequent runs require zero LLM calls	`ActCache` (packages/core/lib/v3/cache/ActCache.ts), `AgentCache` (packages/core/lib/v3/cache/AgentCache.ts)
Self-Healing	Re-infer actions when cached selectors no longer match page state	`ActCache.tryReplay()`, `ActHandler` self-heal loop
Multi-Mode Agents	DOM (tool-based), Hybrid (DOM + coordinates), CUA (screenshot-based)	`V3AgentHandler` (packages/core/lib/v3/agent/V3AgentHandler.ts), `V3CuaAgentHandler` (packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts)
Multi-Provider LLM	Unified interface for OpenAI, Anthropic, Google, Cerebras, Groq, Azure, and any Vercel AI SDK provider	`LLMProvider` (packages/core/lib/v3/llm/LLMProvider.ts), `AISdkClient` (packages/core/lib/v3/llm/aisdk.ts)
CDP-Based Browser	Custom Chrome DevTools Protocol stack replaces Playwright internals for performance	`V3Context` (packages/core/lib/v3/browser/V3Context.ts), `CdpConnection` (packages/core/lib/v3/browser/cdp/CdpConnection.ts)
Local & Cloud	Connect to a local Chrome instance or a remote Browserbase session	`env: "LOCAL" \| "BROWSERBASE"` in `V3Options`

Sources: README.md62-70 packages/core/lib/v3/V3.ts packages/core/lib/v3/cache/ActCache.ts packages/core/lib/v3/browser/V3Context.ts

Example Usage

Sources: README.md82-104 packages/core/README.md82-104

Architecture Overview

The following diagrams show how all major components relate and how requests flow through Stagehand.

Diagram: High-Level Component Map

Diagram: act() Request Flow (Cache Hit vs Cache Miss)

Key Components:

Component	File	Role
`V3`	packages/core/lib/v3/V3.ts	Main orchestrator; manages browser lifecycle and delegates to handlers
`ActHandler`	packages/core/lib/v3/handlers/ActHandler.ts	Executes natural language browser actions
`ExtractHandler`	packages/core/lib/v3/handlers/ExtractHandler.ts	Extracts structured data using Zod schemas
`ObserveHandler`	packages/core/lib/v3/handlers/ObserveHandler.ts	Discovers interactive elements
`V3AgentHandler`	packages/core/lib/v3/agent/V3AgentHandler.ts	DOM/Hybrid multi-step agent loop
`V3CuaAgentHandler`	packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts	Screenshot-based CUA agent loop
`ActCache`	packages/core/lib/v3/cache/ActCache.ts	Caches `act()` results for deterministic replay
`AgentCache`	packages/core/lib/v3/cache/AgentCache.ts	Caches agent step sequences
`LLMProvider`	packages/core/lib/v3/llm/LLMProvider.ts	Factory that resolves the correct `LLMClient` for a given model string
`AISdkClient`	packages/core/lib/v3/llm/aisdk.ts	Wraps Vercel AI SDK (`generateObject`, `generateText`)
`V3Context`	packages/core/lib/v3/browser/V3Context.ts	Multi-page browser context manager
`Page`	packages/core/lib/v3/browser/Page.ts	Per-page API: navigation, evaluation, screenshots, snapshots
`Frame`	packages/core/lib/v3/browser/Frame.ts	Per-frame accessibility tree and element traversal
`CdpConnection`	packages/core/lib/v3/browser/cdp/CdpConnection.ts	WebSocket multiplexer for Chrome DevTools Protocol

Sources: packages/core/lib/v3/V3.ts packages/core/lib/v3/cache/ActCache.ts packages/core/lib/v3/handlers/ActHandler.ts packages/core/lib/v3/llm/LLMProvider.ts packages/core/lib/v3/browser/V3Context.ts packages/core/lib/v3/browser/cdp/CdpConnection.ts

Monorepo Structure

The repository is a pnpm workspace with Turborepo coordinating parallel builds. See page 1.1 for the full layout.

Diagram: Package Dependency Graph

Package	npm name	Published	Key external deps
packages/core	`@browserbasehq/stagehand`	✅ yes	`ai`, `@ai-sdk/*`, `@anthropic-ai/sdk`, `@google/genai`, `ws`, `devtools-protocol`
packages/server	private	❌ no	`fastify`, `fastify-zod-openapi`, `@browserbasehq/stagehand`
packages/evals	private	❌ no	`braintrust`, `@browserbasehq/stagehand`
packages/docs	—	—	`mintlify`

Sources: package.json1-51 packages/core/package.json60-110 packages/server/package.json23-40 packages/evals/package.json18-26

Core Operations

The V3 class (packages/core/lib/v3/V3.ts) exposes four primary operations. Each is handled by a dedicated class:

Operation	`V3` method	Handler class	Cached?	Description
act	`V3.act(instruction, options?)`	`ActHandler` (packages/core/lib/v3/handlers/ActHandler.ts)	✅ `ActCache`	Executes a single browser action via natural language
extract	`V3.extract(instruction, schema)`	`ExtractHandler` (packages/core/lib/v3/handlers/ExtractHandler.ts)	❌	Extracts structured data; validates output against a Zod schema
observe	`V3.observe(instruction?, options?)`	`ObserveHandler` (packages/core/lib/v3/handlers/ObserveHandler.ts)	❌	Discovers interactive elements; returns selectors and suggested actions
agent	`V3.agent(config).execute(instruction)`	`V3AgentHandler` / `V3CuaAgentHandler` (packages/core/lib/v3/agent/)	✅ `AgentCache`	Runs a multi-step autonomous workflow

Common Execution Steps (per operation)

Page resolution — V3.resolvePage() picks the target Page instance
Cache check (act/agent only) — ActCache.tryReplay() or AgentCache.tryReplay() attempts deterministic replay
Snapshot capture — handler calls Page.snapshot() to get a hybrid DOM + accessibility tree
LLM inference (on cache miss) — LLMProvider.getClient().createChatCompletion() runs model inference
Browser execution — translated to CDP commands via Page and CdpConnection
Cache store (act/agent only) — result saved for future deterministic replay
History append — entry added to V3.history for debugging and audit

Agent Modes

The agent() primitive supports three AgentToolMode values:

Mode	Handler	Mechanism
`dom`	`V3AgentHandler`	Tool calls: `act`, `extract`, `observe`, `goto`
`hybrid`	`V3AgentHandler`	DOM tools + coordinate-based click, type, scroll, drag
`cua`	`V3CuaAgentHandler`	Screenshot-based; uses provider CUA APIs (Google, Anthropic, OpenAI, Microsoft)

See pages 4 (operations) and 5 (agent system) for full details.

Sources: packages/core/lib/v3/V3.ts packages/core/lib/v3/handlers/ActHandler.ts packages/core/lib/v3/handlers/ExtractHandler.ts packages/core/lib/v3/handlers/ObserveHandler.ts packages/core/lib/v3/agent/V3AgentHandler.ts packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts

Key Integration Points

LLM Provider System

LLMProvider (packages/core/lib/v3/llm/LLMProvider.ts) is a factory that maps a model string like "openai/gpt-4o" or "anthropic/claude-3-5-sonnet-latest" to a concrete LLMClient. The primary implementation is AISdkClient (packages/core/lib/v3/llm/aisdk.ts), which wraps the Vercel AI SDK's generateObject and generateText. API keys are loaded from environment variables using provider name prefixes (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY).

See page 7 for full LLM integration details including all supported providers and model configuration options.

Browser Connection

Stagehand uses a custom CDP stack rather than a full Playwright dependency at runtime. CdpConnection (packages/core/lib/v3/browser/cdp/CdpConnection.ts) manages a WebSocket to Chrome. V3Context (packages/core/lib/v3/browser/V3Context.ts) tracks multiple pages, and Page + Frame wrap per-page and per-frame operations.

Browser engines (Playwright, Puppeteer, Patchright) are listed as optional peer dependencies and are only used for local browser launching, not for the automation layer itself.

See page 3.2 for the full CDP and browser connection details.

Browserbase Integration

When env: "BROWSERBASE" is set in V3Options, Stagehand connects to a remote Browserbase session using @browserbasehq/sdk. Authentication requires BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID. Browserbase sessions support debugging URLs and persistent state.

See page 12.4 for Browserbase-specific features.

Sources: packages/core/lib/v3/llm/LLMProvider.ts packages/core/lib/v3/llm/aisdk.ts packages/core/lib/v3/browser/cdp/CdpConnection.ts packages/core/lib/v3/browser/V3Context.ts packages/core/package.json86-108 .env.example1-12

Evaluation and Quality Assurance

The @browserbasehq/stagehand-evals package provides a Braintrust-integrated evaluation framework. Evaluations are organized into categories controlled by the EVAL_CATEGORIES environment variable: act, extract, observe, combination, agent, and experimental.

See page 9 for the full evaluation system details.

Sources: packages/evals/package.json1-33 .env.example9-12

REST API and HTTP Server

The @browserbasehq/stagehand-server package (packages/server) wraps the core library in a Fastify REST API. Sessions are started with POST /v1/sessions/start, and operations (act, extract, observe, agentExecute) are called as POST /v1/sessions/:id/<operation>. Zod schemas serve as the source of truth for the OpenAPI spec, which is generated by packages/server/scripts/gen-openapi.ts

See page 8 for the full server and API documentation.

Sources: packages/server/package.json1-40 packages/server/src/server.ts

Use Cases

Stagehand is designed for production browser automation scenarios where traditional scripting is too brittle and pure AI agents are too unpredictable:

Web scraping with schema validation: Extract structured data using extract() with Zod schemas
E2E testing with natural language: Write tests that adapt to UI changes using act()
RPA workflows: Automate business processes across web applications using agent()
Data monitoring: Observe and extract information from dynamic web pages
Form filling: Populate web forms using natural language instructions

The caching system enables these workflows to run deterministically after initial AI-guided execution, reducing cost and latency for repeated operations.

Sources: README.md58-70 packages/docs/v2/best-practices/contributing.mdx1-53

Introduction to Stagehand

Relevant source files

What is Stagehand?

The V3 class (packages/core/lib/v3/V3.ts) provides four primary operations:

act(instruction): Execute browser actions using natural language (e.g., "click on the login button")
extract(instruction, schema): Extract structured data with Zod schema validation
observe(instruction): Discover interactive elements and available actions on the page
agent(config).execute(instruction): Execute multi-step autonomous workflows with DOM-based, Hybrid, or Computer Use Agent (CUA) modes

The framework is distributed as a pnpm/Turborepo monorepo with four main packages:

Package	npm name	Purpose	Key Components
packages/core	`@browserbasehq/stagehand`	Core automation library	`V3` orchestrator, handlers, LLM provider, CDP connection management
packages/evals	`@browserbasehq/stagehand-evals` (private)	Evaluation and benchmarking	Braintrust integration, model comparison, quality gates
packages/server	`@browserbasehq/stagehand-server` (private)	REST API server	Fastify routes, OpenAPI spec generation, multi-region support
packages/docs	—	Documentation site	Mintlify-based docs, API reference

See page 1.1 for a detailed breakdown of the monorepo layout.

Sources: README.md58-70 packages/core/package.json1-10 packages/server/package.json1-10 packages/evals/package.json1-10

Key Features

Feature	Description	Code Entry Point
Hybrid Control	Mix natural language and programmatic code in the same workflow	`V3` methods + `Page` API (packages/core/lib/v3/browser/Page.ts)
Intelligent Caching	Cache AI-driven workflows so subsequent runs require zero LLM calls	`ActCache` (packages/core/lib/v3/cache/ActCache.ts), `AgentCache` (packages/core/lib/v3/cache/AgentCache.ts)
Self-Healing	Re-infer actions when cached selectors no longer match page state	`ActCache.tryReplay()`, `ActHandler` self-heal loop
Multi-Mode Agents	DOM (tool-based), Hybrid (DOM + coordinates), CUA (screenshot-based)	`V3AgentHandler` (packages/core/lib/v3/agent/V3AgentHandler.ts), `V3CuaAgentHandler` (packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts)
Multi-Provider LLM	Unified interface for OpenAI, Anthropic, Google, Cerebras, Groq, Azure, and any Vercel AI SDK provider	`LLMProvider` (packages/core/lib/v3/llm/LLMProvider.ts), `AISdkClient` (packages/core/lib/v3/llm/aisdk.ts)
CDP-Based Browser	Custom Chrome DevTools Protocol stack replaces Playwright internals for performance	`V3Context` (packages/core/lib/v3/browser/V3Context.ts), `CdpConnection` (packages/core/lib/v3/browser/cdp/CdpConnection.ts)
Local & Cloud	Connect to a local Chrome instance or a remote Browserbase session	`env: "LOCAL" \| "BROWSERBASE"` in `V3Options`

Sources: README.md62-70 packages/core/lib/v3/V3.ts packages/core/lib/v3/cache/ActCache.ts packages/core/lib/v3/browser/V3Context.ts

Example Usage

Sources: README.md82-104 packages/core/README.md82-104

Architecture Overview

The following diagrams show how all major components relate and how requests flow through Stagehand.

Diagram: High-Level Component Map

Diagram: act() Request Flow (Cache Hit vs Cache Miss)

Key Components:

Component	File	Role
`V3`	packages/core/lib/v3/V3.ts	Main orchestrator; manages browser lifecycle and delegates to handlers
`ActHandler`	packages/core/lib/v3/handlers/ActHandler.ts	Executes natural language browser actions
`ExtractHandler`	packages/core/lib/v3/handlers/ExtractHandler.ts	Extracts structured data using Zod schemas
`ObserveHandler`	packages/core/lib/v3/handlers/ObserveHandler.ts	Discovers interactive elements
`V3AgentHandler`	packages/core/lib/v3/agent/V3AgentHandler.ts	DOM/Hybrid multi-step agent loop
`V3CuaAgentHandler`	packages/core/lib/v3/agent/cua/V3CuaAgentHandler.ts	Screenshot-based CUA agent loop
`ActCache`	packages/core/lib/v3/cache/ActCache.ts	Caches `act()` results for deterministic replay
`AgentCache`	packages/core/lib/v3/cache/AgentCache.ts	Caches agent step sequences
`LLMProvider`	packages/core/lib/v3/llm/LLMProvider.ts	Factory that resolves the correct `LLMClient` for a given model string
`AISdkClient`	packages/core/lib/v3/llm/aisdk.ts	Wraps Vercel AI SDK (`generateObject`, `generateText`)
`V3Context`	packages/core/lib/v3/browser/V3Context.ts	Multi-page browser context manager
`Page`	packages/core/lib/v3/browser/Page.ts	Per-page API: navigation, evaluation, screenshots, snapshots
`Frame`	packages/core/lib/v3/browser/Frame.ts	Per-frame accessibility tree and element traversal
`CdpConnection`	packages/core/lib/v3/browser/cdp/CdpConnection.ts	WebSocket multiplexer for Chrome DevTools Protocol

Monorepo Structure

The repository is a pnpm workspace with Turborepo coordinating parallel builds. See page 1.1 for the full layout.

Diagram: Package Dependency Graph

Package	npm name	Published	Key external deps
packages/core	`@browserbasehq/stagehand`	✅ yes	`ai`, `@ai-sdk/*`, `@anthropic-ai/sdk`, `@google/genai`, `ws`, `devtools-protocol`
packages/server	private	❌ no	`fastify`, `fastify-zod-openapi`, `@browserbasehq/stagehand`
packages/evals	private	❌ no	`braintrust`, `@browserbasehq/stagehand`
packages/docs	—	—	`mintlify`

Sources: package.json1-51 packages/core/package.json60-110 packages/server/package.json23-40 packages/evals/package.json18-26

Core Operations

The V3 class (packages/core/lib/v3/V3.ts) exposes four primary operations. Each is handled by a dedicated class:

Operation	`V3` method	Handler class	Cached?	Description
act	`V3.act(instruction, options?)`	`ActHandler` (packages/core/lib/v3/handlers/ActHandler.ts)	✅ `ActCache`	Executes a single browser action via natural language
extract	`V3.extract(instruction, schema)`	`ExtractHandler` (packages/core/lib/v3/handlers/ExtractHandler.ts)	❌	Extracts structured data; validates output against a Zod schema
observe	`V3.observe(instruction?, options?)`	`ObserveHandler` (packages/core/lib/v3/handlers/ObserveHandler.ts)	❌	Discovers interactive elements; returns selectors and suggested actions
agent	`V3.agent(config).execute(instruction)`	`V3AgentHandler` / `V3CuaAgentHandler` (packages/core/lib/v3/agent/)	✅ `AgentCache`	Runs a multi-step autonomous workflow

Common Execution Steps (per operation)

Page resolution — V3.resolvePage() picks the target Page instance
Cache check (act/agent only) — ActCache.tryReplay() or AgentCache.tryReplay() attempts deterministic replay
Snapshot capture — handler calls Page.snapshot() to get a hybrid DOM + accessibility tree
LLM inference (on cache miss) — LLMProvider.getClient().createChatCompletion() runs model inference
Browser execution — translated to CDP commands via Page and CdpConnection
Cache store (act/agent only) — result saved for future deterministic replay
History append — entry added to V3.history for debugging and audit

Agent Modes

The agent() primitive supports three AgentToolMode values:

Mode	Handler	Mechanism
`dom`	`V3AgentHandler`	Tool calls: `act`, `extract`, `observe`, `goto`
`hybrid`	`V3AgentHandler`	DOM tools + coordinate-based click, type, scroll, drag
`cua`	`V3CuaAgentHandler`	Screenshot-based; uses provider CUA APIs (Google, Anthropic, OpenAI, Microsoft)

See pages 4 (operations) and 5 (agent system) for full details.

Key Integration Points

LLM Provider System

See page 7 for full LLM integration details including all supported providers and model configuration options.

Browser Connection

Browser engines (Playwright, Puppeteer, Patchright) are listed as optional peer dependencies and are only used for local browser launching, not for the automation layer itself.

See page 3.2 for the full CDP and browser connection details.

Browserbase Integration

See page 12.4 for Browserbase-specific features.

Evaluation and Quality Assurance

See page 9 for the full evaluation system details.

Sources: packages/evals/package.json1-33 .env.example9-12

REST API and HTTP Server

See page 8 for the full server and API documentation.

Sources: packages/server/package.json1-40 packages/server/src/server.ts

Use Cases

Stagehand is designed for production browser automation scenarios where traditional scripting is too brittle and pure AI agents are too unpredictable:

Web scraping with schema validation: Extract structured data using extract() with Zod schemas
E2E testing with natural language: Write tests that adapt to UI changes using act()
RPA workflows: Automate business processes across web applications using agent()
Data monitoring: Observe and extract information from dynamic web pages
Form filling: Populate web forms using natural language instructions

The caching system enables these workflows to run deterministically after initial AI-guided execution, reducing cost and latency for repeated operations.

Sources: README.md58-70 packages/docs/v2/best-practices/contributing.mdx1-53

Introduction to Stagehand

What is Stagehand?

Key Features

Example Usage

Architecture Overview

Monorepo Structure

Core Operations

Common Execution Steps (per operation)

Agent Modes

Key Integration Points

LLM Provider System

Browser Connection

Browserbase Integration

Evaluation and Quality Assurance

REST API and HTTP Server

Use Cases

On this page

Introduction to Stagehand

What is Stagehand?

Key Features

Example Usage

Architecture Overview

Monorepo Structure

Core Operations

Common Execution Steps (per operation)

Agent Modes

Key Integration Points

LLM Provider System

Browser Connection

Browserbase Integration

Evaluation and Quality Assurance

REST API and HTTP Server

Use Cases

On this page