Thanks to visit codestin.com
Credit goes to github.com

Skip to content

steel-dev/leaderboard

Repository files navigation

Browser Agent Leaderboard

This repository presents the current standings of various web agents evaluated on the WebVoyager benchmark (paper). The WebVoyager benchmark comprises 643 tasks across 15 popular websites, assessing agents' abilities to perform diverse web navigation and interaction tasks.


Steel.dev - Open-source Browser API for AI Agents & Apps Steel is an open-source browser API purpose-built for AI agents.

Leaderboard

Rank Agent Organization WebVoyager Score Source Open Source New SOTA
1 Surfer 2 H Company 97.1% Source No Yes Yes
2 Magnitude Magnitude 93.9% Source Yes Yes
3 AIME Browser-Use Aime 92.34% Source No Yes
4 Surfer-H + Holo1 H Company 92.2% Source No Yes
5 Browserable Browserable 90.4% Source Yes Yes
6 Browser Use Browser Use 89.1% Source Yes Yes
7 Operator OpenAI 87% Source No Yes
8 Skyvern 2.0 Skyvern 85.85% Source Yes Yes
9 Project Mariner Google 83.5% Source No
10 Notte Notte 73.1% Source Yes
11 Agent-E Emergence AI 73.1% Source No
12 WebSight Academic Research 68% Source No
13 Runner H 0.1 H Company 67% Source No
14 WebVoyager Academic Research 59.1% Source Yes
15 WILBUR Academic Research 53% Source No

Notes:

  • Open Source: Indicates whether the agent's source code is publicly available.
  • New: Denotes recently introduced agents.
  • SOTA: Signifies agents that have achieved state-of-the-art performance.

Contributing

We encourage contributions to keep this leaderboard up-to-date. If you have information about new agents or updated scores, please submit a pull request or open an issue.

License

This project is licensed under the MIT License.