Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 074efab

Browse files
authored
Actor Use (#3170)
<!-- This is an auto-generated description by cubic. --> ## Summary by cubic Adds a new Actor API (Page, Element, Mouse) for low‑level, Playwright‑like automation on CDP, integrated with BrowserSession. Includes LLM-assisted element finding and content extraction, plus CDP-based tab management. - **New Features** - New actor classes: Page, Element, Mouse with robust click/fill/drag/select, mouse move/scroll, JS evaluate, screenshots. - BrowserSession integration: new_page, get_pages, get_current_page, close_page, cookies; Browser alias preserved. - LLM APIs on Page: get_element_by_prompt, must_get_element_by_prompt, extract_content(Pydantic). - Playgrounds and example custom function using the browser in tools; removed legacy DOM playground; minor agent cleanup. - **Documentation** - New Actor docs: basics, examples, all-parameters; linked in navigation. - Updated browser and tools docs to reference Actor usage. <!-- End of auto-generated description by cubic. -->
2 parents 949bb3a + e1c4181 commit 074efab

File tree

18 files changed

+2729
-135
lines changed

18 files changed

+2729
-135
lines changed

browser_use/actor/README.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# Browser Actor
2+
3+
Browser Actor is a web automation library built on CDP (Chrome DevTools Protocol) that provides low-level browser automation capabilities within the browser-use ecosystem.
4+
5+
## Usage
6+
7+
### Integrated with Browser (Recommended)
8+
```python
9+
from browser_use import Browser # Alias for BrowserSession
10+
11+
# Create and start browser session
12+
browser = Browser()
13+
await browser.start()
14+
15+
# Create new tabs and navigate
16+
page = await browser.new_page("https://example.com")
17+
pages = await browser.get_pages()
18+
current_page = await browser.get_current_page()
19+
```
20+
21+
### Direct Page Access (Advanced)
22+
```python
23+
from browser_use.actor import Page, Element, Mouse
24+
25+
# Create page with existing browser session
26+
page = Page(browser_session, target_id, session_id)
27+
```
28+
29+
## Basic Operations
30+
31+
```python
32+
# Tab Management
33+
page = await browser.new_page() # Create blank tab
34+
page = await browser.new_page("https://example.com") # Create tab with URL
35+
pages = await browser.get_pages() # Get all existing tabs
36+
await browser.close_page(page) # Close specific tab
37+
38+
# Navigation
39+
await page.goto("https://example.com")
40+
await page.go_back()
41+
await page.go_forward()
42+
await page.reload()
43+
```
44+
45+
## Element Operations
46+
47+
```python
48+
# Find elements by CSS selector
49+
elements = await page.get_elements_by_css_selector("input[type='text']")
50+
buttons = await page.get_elements_by_css_selector("button.submit")
51+
52+
# Get element by backend node ID
53+
element = await page.get_element(backend_node_id=12345)
54+
55+
# AI-powered element finding (requires LLM)
56+
element = await page.get_element_by_prompt("search button", llm=your_llm)
57+
element = await page.must_get_element_by_prompt("login form", llm=your_llm)
58+
```
59+
60+
> **Note**: `get_elements_by_css_selector` returns immediately without waiting for visibility.
61+
62+
## Element Interactions
63+
64+
```python
65+
# Element actions
66+
await element.click(button='left', click_count=1, modifiers=['Control'])
67+
await element.fill("Hello World") # Clears first, then types
68+
await element.hover()
69+
await element.focus()
70+
await element.check() # Toggle checkbox/radio
71+
await element.select_option(["option1", "option2"]) # For dropdown/select
72+
await element.drag_to(target_element) # Drag and drop
73+
74+
# Element properties
75+
value = await element.get_attribute("value")
76+
box = await element.get_bounding_box() # Returns BoundingBox or None
77+
info = await element.get_basic_info() # Comprehensive element info
78+
screenshot_b64 = await element.screenshot(format='jpeg')
79+
```
80+
81+
## Mouse Operations
82+
83+
```python
84+
# Mouse operations
85+
mouse = await page.mouse
86+
await mouse.click(x=100, y=200, button='left', click_count=1)
87+
await mouse.move(x=300, y=400, steps=1)
88+
await mouse.down(button='left') # Press button
89+
await mouse.up(button='left') # Release button
90+
await mouse.scroll(x=0, y=100, delta_x=0, delta_y=-500) # Scroll at coordinates
91+
```
92+
93+
## Page Operations
94+
95+
```python
96+
# JavaScript evaluation
97+
result = await page.evaluate('() => document.title') # Must use arrow function format
98+
result = await page.evaluate('(x, y) => x + y', 10, 20) # With arguments
99+
100+
# Keyboard input
101+
await page.press("Control+A") # Key combinations supported
102+
await page.press("Escape") # Single keys
103+
104+
# Page controls
105+
await page.set_viewport_size(width=1920, height=1080)
106+
page_screenshot = await page.screenshot() # JPEG by default
107+
page_png = await page.screenshot(format="png", quality=90)
108+
109+
# Page information
110+
url = await page.get_url()
111+
title = await page.get_title()
112+
```
113+
114+
## AI-Powered Features
115+
116+
```python
117+
# Content extraction using LLM
118+
from pydantic import BaseModel
119+
120+
class ProductInfo(BaseModel):
121+
name: str
122+
price: float
123+
description: str
124+
125+
# Extract structured data from current page
126+
products = await page.extract_content(
127+
"Find all products with their names, prices and descriptions",
128+
ProductInfo,
129+
llm=your_llm
130+
)
131+
```
132+
133+
## Core Classes
134+
135+
- **BrowserSession** (aliased as **Browser**): Main browser session manager with tab operations
136+
- **Page**: Represents a single browser tab or iframe for page-level operations
137+
- **Element**: Individual DOM element for interactions and property access
138+
- **Mouse**: Mouse operations within a page (click, move, scroll)
139+
140+
## API Reference
141+
142+
### BrowserSession Methods (Tab Management)
143+
- `start()` - Initialize and start the browser session
144+
- `stop()` - Stop the browser session (keeps browser alive)
145+
- `kill()` - Kill the browser process and reset all state
146+
- `new_page(url=None)``Page` - Create blank tab or navigate to URL
147+
- `get_pages()``list[Page]` - Get all available pages
148+
- `get_current_page()``Page | None` - Get the currently focused page
149+
- `close_page(page: Page | str)` - Close page by object or ID
150+
- Session management and CDP client operations
151+
152+
### Page Methods (Page Operations)
153+
- `get_elements_by_css_selector(selector: str)``list[Element]` - Find elements by CSS selector
154+
- `get_element(backend_node_id: int)``Element` - Get element by backend node ID
155+
- `get_element_by_prompt(prompt: str, llm)``Element | None` - AI-powered element finding
156+
- `must_get_element_by_prompt(prompt: str, llm)``Element` - AI element finding (raises if not found)
157+
- `extract_content(prompt: str, structured_output: type[T], llm)``T` - Extract structured data using LLM
158+
- `goto(url: str)` - Navigate this page to URL
159+
- `go_back()`, `go_forward()` - Navigate history (with error handling)
160+
- `reload()` - Reload the current page
161+
- `evaluate(page_function: str, *args)``str` - Execute JavaScript (MUST use (...args) => format)
162+
- `press(key: str)` - Press key on page (supports "Control+A" format)
163+
- `set_viewport_size(width: int, height: int)` - Set viewport dimensions
164+
- `screenshot(format='jpeg', quality=None)``str` - Take page screenshot, return base64
165+
- `get_url()``str`, `get_title()``str` - Get page information
166+
- `mouse``Mouse` - Get mouse interface for this page
167+
168+
### Element Methods (DOM Interactions)
169+
- `click(button='left', click_count=1, modifiers=None)` - Click element with advanced fallbacks
170+
- `fill(text: str, clear_existing=True)` - Fill input with text (clears first by default)
171+
- `hover()` - Hover over element
172+
- `focus()` - Focus the element
173+
- `check()` - Toggle checkbox/radio button (clicks to change state)
174+
- `select_option(values: str | list[str])` - Select dropdown options
175+
- `drag_to(target_element: Element | Position, source_position=None, target_position=None)` - Drag to target element
176+
- `get_attribute(name: str)``str | None` - Get attribute value
177+
- `get_bounding_box()``BoundingBox | None` - Get element position/size
178+
- `screenshot(format='jpeg', quality=None)``str` - Take element screenshot, return base64
179+
- `get_basic_info()``ElementInfo` - Get comprehensive element information
180+
181+
182+
### Mouse Methods (Coordinate-Based Operations)
183+
- `click(x: int, y: int, button='left', click_count=1)` - Click at coordinates
184+
- `move(x: int, y: int, steps=1)` - Move to coordinates
185+
- `down(button='left', click_count=1)`, `up(button='left', click_count=1)` - Press/release button
186+
- `scroll(x=0, y=0, delta_x=None, delta_y=None)` - Scroll page at coordinates
187+
188+
## Type Definitions
189+
190+
### Position
191+
```python
192+
class Position(TypedDict):
193+
x: float
194+
y: float
195+
```
196+
197+
### BoundingBox
198+
```python
199+
class BoundingBox(TypedDict):
200+
x: float
201+
y: float
202+
width: float
203+
height: float
204+
```
205+
206+
### ElementInfo
207+
```python
208+
class ElementInfo(TypedDict):
209+
backendNodeId: int # CDP backend node ID
210+
nodeId: int | None # CDP node ID
211+
nodeName: str # HTML tag name (e.g., "DIV", "INPUT")
212+
nodeType: int # DOM node type
213+
nodeValue: str | None # Text content for text nodes
214+
attributes: dict[str, str] # HTML attributes
215+
boundingBox: BoundingBox | None # Element position and size
216+
error: str | None # Error message if info retrieval failed
217+
```
218+
219+
## Important Usage Notes
220+
221+
**This is browser-use actor, NOT Playwright or Selenium.** Only use the methods documented above.
222+
223+
### Critical JavaScript Rules
224+
- `page.evaluate()` MUST use `(...args) => {}` arrow function format
225+
- Always returns string (objects are JSON-stringified automatically)
226+
- Use single quotes around the function: `page.evaluate('() => document.title')`
227+
- For complex selectors in JS: `'() => document.querySelector("input[name=\\"email\\"]")'`
228+
229+
### Method Restrictions
230+
- `get_elements_by_css_selector()` returns immediately (no automatic waiting)
231+
- For dropdowns: use `element.select_option()`, NOT `element.fill()`
232+
- Form submission: click submit button or use `page.press("Enter")`
233+
- No methods like: `element.submit()`, `element.dispatch_event()`, `element.get_property()`
234+
235+
### Error Prevention
236+
- Always verify page state changes with `page.get_url()`, `page.get_title()`
237+
- Use `element.get_attribute()` to check element properties
238+
- Validate CSS selectors before use
239+
- Handle navigation timing with appropriate `asyncio.sleep()` calls
240+
241+
### AI Features
242+
- `get_element_by_prompt()` and `extract_content()` require an LLM instance
243+
- These methods use DOM analysis and structured output parsing
244+
- Best for complex page understanding and data extraction tasks

browser_use/actor/__init__.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
"""CDP-Use High-Level Library
2+
3+
A Playwright-like library built on top of CDP (Chrome DevTools Protocol).
4+
"""
5+
6+
from .element import Element
7+
from .mouse import Mouse
8+
from .page import Page
9+
10+
__all__ = ['Page', 'Element', 'Mouse']

0 commit comments

Comments
 (0)