This is a playground for you to test, explore, and get inspired by the power of Browserbase and Open AI's Computer Use Agent. This is free and always will be! It's not a product, just a demo playground
This project uses TypeScript and requires Node.js. We recommend using Node.js version 14.x or later.
First, install the dependencies for this repository:
npm installNext, copy the example environment variables:
cp .env.example .env.localYou'll need to set up your API keys:
- 
Get your OpenAI API key from OpenAI's dashboard 
- 
Get your Browserbase API key and project ID from Browserbase 
- 
Clone this repository: git clone https://github.com/browserbase/cua-browser.git cd cua-browser
- 
Install dependencies: npm install 
- 
Create a .env.localfile with your API keys. You can get your API keys from OpenAI and BrowserbaseOPENAI_API_KEY=your_openai_api_key OPENAI_ORG=your_openai_org_id (optional) BROWSERBASE_API_KEY=your_browserbase_api_key BROWSERBASE_PROJECT_ID=your_browserbase_project_id
- 
Start the development server: npm run dev 
Open http://localhost:3000 with your browser to see CUA Browser in action. You can interact with the CUA Browser by typing natural language commands in the input field and observing the browser's actions in response.
Here's a basic example of how to implement the Browserbase Compute Use Agent:
import { Agent } from './app/api/agent/agent';
import { BrowserbaseBrowser } from './app/api/agent/browserbase';
async function main() {
  // Initialize the browser
  const browser = new BrowserbaseBrowser(1024, 768);
  await browser.connect();
  // Initialize the agent
  const agent = new Agent(
    "computer-use-preview",
    browser,
    (message) => {
      console.log(`Safety check: ${message}`);
      return true; // Acknowledge all safety checks
    }
  );
  // Prepare the input for the agent
  const inputItems = [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Go to google.com and search for 'Browserbase'"
        }
      ]
    }
  ];
  // Get the action from the agent
  const { output, responseId } = await agent.getAction(inputItems, undefined);
  // Take the action
  const results = await agent.takeAction(output);
  // Print the results
  console.log("Action results:", results);
  // Store the response ID for potential future use
  agent.lastResponseId = responseId;
  // Disconnect the browser
  await browser.disconnect();
}
main().catch(console.error);This example demonstrates how to:
- Initialize the BrowserbaseBrowser with specific dimensions.
- Create an Agent instance with the appropriate model and browser.
- Prepare input items for the agent.
- Get an action from the agent using the getActionmethod.
- Execute the action using the takeActionmethod.
- Handle the results of the action.
- Store the response ID for potential future interactions.
Note that this example uses the getAction and takeAction methods separately, which allows for more granular control over the agent's behavior. You can expand on this basic example to create more complex interactions with the browser based on your specific use case.
- agent.ts: The main Agent class that handles interactions with the OpenAI API
- base_playwright.ts: Base class for Playwright-based browser automation
- browserbase.ts: Implementation of the Browserbase browser
- utils.ts: Utility functions for API calls and image handling