|
| 1 | +# Browser Actor |
| 2 | + |
| 3 | +Browser Actor is a web automation library built on CDP (Chrome DevTools Protocol) that provides low-level browser automation capabilities within the browser-use ecosystem. |
| 4 | + |
| 5 | +## Usage |
| 6 | + |
| 7 | +### Integrated with Browser (Recommended) |
| 8 | +```python |
| 9 | +from browser_use import Browser # Alias for BrowserSession |
| 10 | + |
| 11 | +# Create and start browser session |
| 12 | +browser = Browser() |
| 13 | +await browser.start() |
| 14 | + |
| 15 | +# Create new tabs and navigate |
| 16 | +page = await browser.new_page("https://example.com") |
| 17 | +pages = await browser.get_pages() |
| 18 | +current_page = await browser.get_current_page() |
| 19 | +``` |
| 20 | + |
| 21 | +### Direct Page Access (Advanced) |
| 22 | +```python |
| 23 | +from browser_use.actor import Page, Element, Mouse |
| 24 | + |
| 25 | +# Create page with existing browser session |
| 26 | +page = Page(browser_session, target_id, session_id) |
| 27 | +``` |
| 28 | + |
| 29 | +## Basic Operations |
| 30 | + |
| 31 | +```python |
| 32 | +# Tab Management |
| 33 | +page = await browser.new_page() # Create blank tab |
| 34 | +page = await browser.new_page("https://example.com") # Create tab with URL |
| 35 | +pages = await browser.get_pages() # Get all existing tabs |
| 36 | +await browser.close_page(page) # Close specific tab |
| 37 | + |
| 38 | +# Navigation |
| 39 | +await page.goto("https://example.com") |
| 40 | +await page.go_back() |
| 41 | +await page.go_forward() |
| 42 | +await page.reload() |
| 43 | +``` |
| 44 | + |
| 45 | +## Element Operations |
| 46 | + |
| 47 | +```python |
| 48 | +# Find elements by CSS selector |
| 49 | +elements = await page.get_elements_by_css_selector("input[type='text']") |
| 50 | +buttons = await page.get_elements_by_css_selector("button.submit") |
| 51 | + |
| 52 | +# Get element by backend node ID |
| 53 | +element = await page.get_element(backend_node_id=12345) |
| 54 | + |
| 55 | +# AI-powered element finding (requires LLM) |
| 56 | +element = await page.get_element_by_prompt("search button", llm=your_llm) |
| 57 | +element = await page.must_get_element_by_prompt("login form", llm=your_llm) |
| 58 | +``` |
| 59 | + |
| 60 | +> **Note**: `get_elements_by_css_selector` returns immediately without waiting for visibility. |
| 61 | +
|
| 62 | +## Element Interactions |
| 63 | + |
| 64 | +```python |
| 65 | +# Element actions |
| 66 | +await element.click(button='left', click_count=1, modifiers=['Control']) |
| 67 | +await element.fill("Hello World") # Clears first, then types |
| 68 | +await element.hover() |
| 69 | +await element.focus() |
| 70 | +await element.check() # Toggle checkbox/radio |
| 71 | +await element.select_option(["option1", "option2"]) # For dropdown/select |
| 72 | +await element.drag_to(target_element) # Drag and drop |
| 73 | + |
| 74 | +# Element properties |
| 75 | +value = await element.get_attribute("value") |
| 76 | +box = await element.get_bounding_box() # Returns BoundingBox or None |
| 77 | +info = await element.get_basic_info() # Comprehensive element info |
| 78 | +screenshot_b64 = await element.screenshot(format='jpeg') |
| 79 | +``` |
| 80 | + |
| 81 | +## Mouse Operations |
| 82 | + |
| 83 | +```python |
| 84 | +# Mouse operations |
| 85 | +mouse = await page.mouse |
| 86 | +await mouse.click(x=100, y=200, button='left', click_count=1) |
| 87 | +await mouse.move(x=300, y=400, steps=1) |
| 88 | +await mouse.down(button='left') # Press button |
| 89 | +await mouse.up(button='left') # Release button |
| 90 | +await mouse.scroll(x=0, y=100, delta_x=0, delta_y=-500) # Scroll at coordinates |
| 91 | +``` |
| 92 | + |
| 93 | +## Page Operations |
| 94 | + |
| 95 | +```python |
| 96 | +# JavaScript evaluation |
| 97 | +result = await page.evaluate('() => document.title') # Must use arrow function format |
| 98 | +result = await page.evaluate('(x, y) => x + y', 10, 20) # With arguments |
| 99 | + |
| 100 | +# Keyboard input |
| 101 | +await page.press("Control+A") # Key combinations supported |
| 102 | +await page.press("Escape") # Single keys |
| 103 | + |
| 104 | +# Page controls |
| 105 | +await page.set_viewport_size(width=1920, height=1080) |
| 106 | +page_screenshot = await page.screenshot() # JPEG by default |
| 107 | +page_png = await page.screenshot(format="png", quality=90) |
| 108 | + |
| 109 | +# Page information |
| 110 | +url = await page.get_url() |
| 111 | +title = await page.get_title() |
| 112 | +``` |
| 113 | + |
| 114 | +## AI-Powered Features |
| 115 | + |
| 116 | +```python |
| 117 | +# Content extraction using LLM |
| 118 | +from pydantic import BaseModel |
| 119 | + |
| 120 | +class ProductInfo(BaseModel): |
| 121 | + name: str |
| 122 | + price: float |
| 123 | + description: str |
| 124 | + |
| 125 | +# Extract structured data from current page |
| 126 | +products = await page.extract_content( |
| 127 | + "Find all products with their names, prices and descriptions", |
| 128 | + ProductInfo, |
| 129 | + llm=your_llm |
| 130 | +) |
| 131 | +``` |
| 132 | + |
| 133 | +## Core Classes |
| 134 | + |
| 135 | +- **BrowserSession** (aliased as **Browser**): Main browser session manager with tab operations |
| 136 | +- **Page**: Represents a single browser tab or iframe for page-level operations |
| 137 | +- **Element**: Individual DOM element for interactions and property access |
| 138 | +- **Mouse**: Mouse operations within a page (click, move, scroll) |
| 139 | + |
| 140 | +## API Reference |
| 141 | + |
| 142 | +### BrowserSession Methods (Tab Management) |
| 143 | +- `start()` - Initialize and start the browser session |
| 144 | +- `stop()` - Stop the browser session (keeps browser alive) |
| 145 | +- `kill()` - Kill the browser process and reset all state |
| 146 | +- `new_page(url=None)` → `Page` - Create blank tab or navigate to URL |
| 147 | +- `get_pages()` → `list[Page]` - Get all available pages |
| 148 | +- `get_current_page()` → `Page | None` - Get the currently focused page |
| 149 | +- `close_page(page: Page | str)` - Close page by object or ID |
| 150 | +- Session management and CDP client operations |
| 151 | + |
| 152 | +### Page Methods (Page Operations) |
| 153 | +- `get_elements_by_css_selector(selector: str)` → `list[Element]` - Find elements by CSS selector |
| 154 | +- `get_element(backend_node_id: int)` → `Element` - Get element by backend node ID |
| 155 | +- `get_element_by_prompt(prompt: str, llm)` → `Element | None` - AI-powered element finding |
| 156 | +- `must_get_element_by_prompt(prompt: str, llm)` → `Element` - AI element finding (raises if not found) |
| 157 | +- `extract_content(prompt: str, structured_output: type[T], llm)` → `T` - Extract structured data using LLM |
| 158 | +- `goto(url: str)` - Navigate this page to URL |
| 159 | +- `go_back()`, `go_forward()` - Navigate history (with error handling) |
| 160 | +- `reload()` - Reload the current page |
| 161 | +- `evaluate(page_function: str, *args)` → `str` - Execute JavaScript (MUST use (...args) => format) |
| 162 | +- `press(key: str)` - Press key on page (supports "Control+A" format) |
| 163 | +- `set_viewport_size(width: int, height: int)` - Set viewport dimensions |
| 164 | +- `screenshot(format='jpeg', quality=None)` → `str` - Take page screenshot, return base64 |
| 165 | +- `get_url()` → `str`, `get_title()` → `str` - Get page information |
| 166 | +- `mouse` → `Mouse` - Get mouse interface for this page |
| 167 | + |
| 168 | +### Element Methods (DOM Interactions) |
| 169 | +- `click(button='left', click_count=1, modifiers=None)` - Click element with advanced fallbacks |
| 170 | +- `fill(text: str, clear_existing=True)` - Fill input with text (clears first by default) |
| 171 | +- `hover()` - Hover over element |
| 172 | +- `focus()` - Focus the element |
| 173 | +- `check()` - Toggle checkbox/radio button (clicks to change state) |
| 174 | +- `select_option(values: str | list[str])` - Select dropdown options |
| 175 | +- `drag_to(target_element: Element | Position, source_position=None, target_position=None)` - Drag to target element |
| 176 | +- `get_attribute(name: str)` → `str | None` - Get attribute value |
| 177 | +- `get_bounding_box()` → `BoundingBox | None` - Get element position/size |
| 178 | +- `screenshot(format='jpeg', quality=None)` → `str` - Take element screenshot, return base64 |
| 179 | +- `get_basic_info()` → `ElementInfo` - Get comprehensive element information |
| 180 | + |
| 181 | + |
| 182 | +### Mouse Methods (Coordinate-Based Operations) |
| 183 | +- `click(x: int, y: int, button='left', click_count=1)` - Click at coordinates |
| 184 | +- `move(x: int, y: int, steps=1)` - Move to coordinates |
| 185 | +- `down(button='left', click_count=1)`, `up(button='left', click_count=1)` - Press/release button |
| 186 | +- `scroll(x=0, y=0, delta_x=None, delta_y=None)` - Scroll page at coordinates |
| 187 | + |
| 188 | +## Type Definitions |
| 189 | + |
| 190 | +### Position |
| 191 | +```python |
| 192 | +class Position(TypedDict): |
| 193 | + x: float |
| 194 | + y: float |
| 195 | +``` |
| 196 | + |
| 197 | +### BoundingBox |
| 198 | +```python |
| 199 | +class BoundingBox(TypedDict): |
| 200 | + x: float |
| 201 | + y: float |
| 202 | + width: float |
| 203 | + height: float |
| 204 | +``` |
| 205 | + |
| 206 | +### ElementInfo |
| 207 | +```python |
| 208 | +class ElementInfo(TypedDict): |
| 209 | + backendNodeId: int # CDP backend node ID |
| 210 | + nodeId: int | None # CDP node ID |
| 211 | + nodeName: str # HTML tag name (e.g., "DIV", "INPUT") |
| 212 | + nodeType: int # DOM node type |
| 213 | + nodeValue: str | None # Text content for text nodes |
| 214 | + attributes: dict[str, str] # HTML attributes |
| 215 | + boundingBox: BoundingBox | None # Element position and size |
| 216 | + error: str | None # Error message if info retrieval failed |
| 217 | +``` |
| 218 | + |
| 219 | +## Important Usage Notes |
| 220 | + |
| 221 | +**This is browser-use actor, NOT Playwright or Selenium.** Only use the methods documented above. |
| 222 | + |
| 223 | +### Critical JavaScript Rules |
| 224 | +- `page.evaluate()` MUST use `(...args) => {}` arrow function format |
| 225 | +- Always returns string (objects are JSON-stringified automatically) |
| 226 | +- Use single quotes around the function: `page.evaluate('() => document.title')` |
| 227 | +- For complex selectors in JS: `'() => document.querySelector("input[name=\\"email\\"]")'` |
| 228 | + |
| 229 | +### Method Restrictions |
| 230 | +- `get_elements_by_css_selector()` returns immediately (no automatic waiting) |
| 231 | +- For dropdowns: use `element.select_option()`, NOT `element.fill()` |
| 232 | +- Form submission: click submit button or use `page.press("Enter")` |
| 233 | +- No methods like: `element.submit()`, `element.dispatch_event()`, `element.get_property()` |
| 234 | + |
| 235 | +### Error Prevention |
| 236 | +- Always verify page state changes with `page.get_url()`, `page.get_title()` |
| 237 | +- Use `element.get_attribute()` to check element properties |
| 238 | +- Validate CSS selectors before use |
| 239 | +- Handle navigation timing with appropriate `asyncio.sleep()` calls |
| 240 | + |
| 241 | +### AI Features |
| 242 | +- `get_element_by_prompt()` and `extract_content()` require an LLM instance |
| 243 | +- These methods use DOM analysis and structured output parsing |
| 244 | +- Best for complex page understanding and data extraction tasks |
0 commit comments