Thanks to visit codestin.com
Credit goes to github.com

Skip to content

browserbase playwright implementation #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from
Draft

Conversation

alexdphan
Copy link
Member

@alexdphan alexdphan commented Apr 30, 2025

Browserbase Playwright MCP Implementation

Current Status

This project implements a server for browser interactions, utilizing Playwright connected to Browserbase's cloud browser service.

The server can communicate two ways:

  1. Model Context Protocol (MCP): Communicates over standard input/output (stdio).
  2. Server-Side Events (SSE): Communicates over HTTP at a configurable host and port.

Implemented Functions

(Legend: [x] = Implemented, [~] = Partially Implemented / Needs Update, [ ] = Not Implemented)

The following standard MCP browser functions (or their equivalents) have been implemented:

  • browser_snapshot → Implemented as browserbase_snapshot (Provides accessibility tree potentially containing ref for element references)
  • browser_click → Implemented as browserbase_click (Uses ref argument from snapshot)
  • browser_type → Implemented as browserbase_type (Uses ref argument from snapshot)
  • browser_drag → Defined as browserbase_drag (Uses ref argument, server implementation potentially incomplete)
  • browser_hover → Defined as browserbase_hover (Uses ref argument, server implementation potentially incomplete)
  • browser_select_option → Implemented as browserbase_select_option (Uses ref argument)
  • browser_take_screenshot → Implemented as browserbase_take_screenshot (Supports full page/viewport, optional element screenshot via ref)
  • browser_navigate → Implemented as browserbase_navigate
  • browser_navigate_back → Implemented as browserbase_navigate_backward
  • browser_navigate_forward → Implemented as browserbase_navigate_forward
  • browser_press_key → Implemented as browserbase_press_key (Does not use ref)
  • browser_close → Implemented as browserbase_close
  • browser_wait → Implemented as browserbase_wait
  • browser_resize → Implemented as browserbase_resize
  • browser_screen_capture → Implemented as browserbase_screen_capture

Unique Browserbase Functions

The following functions are specific to this Browserbase implementation:

  • browserbase_session_create: Manages Browserbase session creation/reuse.
  • browserbase_get_text: Extracts text content from page or element (using selector - does not use ref).
  • browserbase_context_create: Creates a persistent Browserbase context.
  • browserbase_context_delete: Deletes a Browserbase context.
  • browserbase_cookies_add: Adds cookies to the session.
  • browserbase_cookies_get: Retrieves cookies from the session.

Missing Functions

These aren't necessary, but the following standard MCP functions remain to be implemented:

  • browser_screen_move_mouse
  • browser_screen_click
  • browser_screen_drag
  • browser_screen_type
  • browser_tab_new
  • browser_tab_select
  • browser_tab_list
  • browser_tab_close
  • browser_console_messages
  • browser_file_upload
  • browser_pdf_save
  • browser_install
  • browser_handle_dialog

Implementation Notes

  • Uses Playwright connected via CDP to Browserbase cloud sessions.
  • Includes basic session management (creation, validation, reuse, recreation on disconnects) for a default session and named sessions.
  • Uses page.accessibility.snapshot() for browserbase_snapshot to generate ref values.
  • Interaction tools (browserbase_click, browserbase_type, browserbase_drag, browserbase_hover, browserbase_select_option) are intended to primarily rely on the ref argument from the client, using a page.locator(\ref=${ref}`)` strategy.
    • browserbase_click, browserbase_type, and browserbase_select_option are implemented using ref.
    • browserbase_drag and browserbase_hover have definitions using ref, but the server-side implementation is likely incomplete or missing.
  • browserbase_press_key does not use ref.
  • browserbase_get_text uses a CSS selector, not ref.
  • Error handling and console logging (console.error) are implemented.
  • Graceful shutdown is handled for SIGINT/SIGTERM.

Configuration

The server behavior can be configured through various means (e.g., environment variables, configuration files). The available options correspond to the Config type defined in config.d.ts:

  • browserbaseApiKey (string): Your Browserbase API Key.
  • browserbaseProjectId (string): Your Browserbase Project ID.
  • proxies (boolean, optional, default: false): Whether to enable Browserbase proxies. See [Browserbase Proxies Documentation](https://docs.browserbase.com/features/proxies).
  • context (string, optional): A Browserbase Context ID to reuse for sessions.
  • server (object, optional): Configuration for running in server mode (MCP over stdio or SSE).
    • port (number, optional): The port to listen on for SSE or MCP transport.
    • host (string, optional, default: localhost): The host interface to bind the server to (e.g., 0.0.0.0 for all interfaces).

Future Work

  • Implement the missing standard MCP functions listed above.
  • Implement the server-side ref-based logic for browserbase_drag and browserbase_hover.
  • Verify and Test: Thoroughly test the ref locator strategy across different websites to ensure the ref values generated by snapshot are reliably usable by the interaction tools. Refine the snapshot generation or locator strategy if needed.
  • Implement element-specific screenshots in browserbase_take_screenshot using the ref and verify functionality.
  • Refine session management and error recovery logic.
  • Implement better logging (e.g., using log levels).
  • Add automated tests.

@alexdphan alexdphan changed the title playwright file browserbase playwright implementation Apr 30, 2025
@@ -1,110 +1,48 @@
# Browserbase MCP Server
# Playwright Browserbase MCP Server
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here - didn't Paul say yesterday we want to combine Stagehand and Browserbase MCPs into one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can ignore this readme for now.

Ideally getting the foundations of playwright mcp, and enhancing this current mcp with browserbase / stagehand abilities (flexible web automations, recordings, etc.)

* Implement true `ref`-based interaction logic for click, type, drag, hover, select_option.
* Implement element-specific screenshots using `ref`.
* Add more standard Playwright MCP tools (tabs, navigation, etc.).
* Add tests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO - we need to test within Cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants