-
Notifications
You must be signed in to change notification settings - Fork 154
browserbase playwright implementation #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -1,110 +1,48 @@ | |||
# Browserbase MCP Server | |||
# Playwright Browserbase MCP Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused here - didn't Paul say yesterday we want to combine Stagehand and Browserbase MCPs into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can ignore this readme for now.
Ideally getting the foundations of playwright mcp, and enhancing this current mcp with browserbase / stagehand abilities (flexible web automations, recordings, etc.)
* Implement true `ref`-based interaction logic for click, type, drag, hover, select_option. | ||
* Implement element-specific screenshots using `ref`. | ||
* Add more standard Playwright MCP tools (tabs, navigation, etc.). | ||
* Add tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO - we need to test within Cursor
…ntext Fm/stg 365 add cookies and context
[feature]: Add SSE support
…okies-handling consistent return action
Browserbase Playwright MCP Implementation
Current Status
This project implements a server for browser interactions, utilizing Playwright connected to Browserbase's cloud browser service.
The server can communicate two ways:
Implemented Functions
(Legend:
[x]
= Implemented,[~]
= Partially Implemented / Needs Update,[ ]
= Not Implemented)The following standard MCP browser functions (or their equivalents) have been implemented:
browser_snapshot
→ Implemented asbrowserbase_snapshot
(Provides accessibility tree potentially containingref
for element references)browser_click
→ Implemented asbrowserbase_click
(Usesref
argument from snapshot)browser_type
→ Implemented asbrowserbase_type
(Usesref
argument from snapshot)browser_drag
→ Defined asbrowserbase_drag
(Usesref
argument, server implementation potentially incomplete)browser_hover
→ Defined asbrowserbase_hover
(Usesref
argument, server implementation potentially incomplete)browser_select_option
→ Implemented asbrowserbase_select_option
(Usesref
argument)browser_take_screenshot
→ Implemented asbrowserbase_take_screenshot
(Supports full page/viewport, optional element screenshot viaref
)browser_navigate
→ Implemented asbrowserbase_navigate
browser_navigate_back
→ Implemented asbrowserbase_navigate_backward
browser_navigate_forward
→ Implemented asbrowserbase_navigate_forward
browser_press_key
→ Implemented asbrowserbase_press_key
(Does not useref
)browser_close
→ Implemented asbrowserbase_close
browser_wait
→ Implemented asbrowserbase_wait
browser_resize
→ Implemented asbrowserbase_resize
browser_screen_capture
→ Implemented asbrowserbase_screen_capture
Unique Browserbase Functions
The following functions are specific to this Browserbase implementation:
browserbase_session_create
: Manages Browserbase session creation/reuse.browserbase_get_text
: Extracts text content from page or element (usingselector
- does not useref
).browserbase_context_create
: Creates a persistent Browserbase context.browserbase_context_delete
: Deletes a Browserbase context.browserbase_cookies_add
: Adds cookies to the session.browserbase_cookies_get
: Retrieves cookies from the session.Missing Functions
These aren't necessary, but the following standard MCP functions remain to be implemented:
browser_screen_move_mouse
browser_screen_click
browser_screen_drag
browser_screen_type
browser_tab_new
browser_tab_select
browser_tab_list
browser_tab_close
browser_console_messages
browser_file_upload
browser_pdf_save
browser_install
browser_handle_dialog
Implementation Notes
page.accessibility.snapshot()
forbrowserbase_snapshot
to generateref
values.browserbase_click
,browserbase_type
,browserbase_drag
,browserbase_hover
,browserbase_select_option
) are intended to primarily rely on theref
argument from the client, using apage.locator(\
ref=${ref}`)` strategy.browserbase_click
,browserbase_type
, andbrowserbase_select_option
are implemented usingref
.browserbase_drag
andbrowserbase_hover
have definitions usingref
, but the server-side implementation is likely incomplete or missing.browserbase_press_key
does not useref
.browserbase_get_text
uses a CSSselector
, notref
.console.error
) are implemented.Configuration
The server behavior can be configured through various means (e.g., environment variables, configuration files). The available options correspond to the
Config
type defined inconfig.d.ts
:browserbaseApiKey
(string): Your Browserbase API Key.browserbaseProjectId
(string): Your Browserbase Project ID.proxies
(boolean, optional, default:false
): Whether to enable Browserbase proxies. See [Browserbase Proxies Documentation](https://docs.browserbase.com/features/proxies).context
(string, optional): A Browserbase Context ID to reuse for sessions.server
(object, optional): Configuration for running in server mode (MCP over stdio or SSE).port
(number, optional): The port to listen on for SSE or MCP transport.host
(string, optional, default:localhost
): The host interface to bind the server to (e.g.,0.0.0.0
for all interfaces).Future Work
ref
-based logic forbrowserbase_drag
andbrowserbase_hover
.ref
locator strategy across different websites to ensure theref
values generated bysnapshot
are reliably usable by the interaction tools. Refine the snapshot generation or locator strategy if needed.browserbase_take_screenshot
using theref
and verify functionality.