AI-Agentic-Tools-Benchmarking
Popular repositories Loading
-
ToolBench
ToolBench PublicForked from OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Python
-
WebVoyager
WebVoyager PublicForked from MinorJerry/WebVoyager
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
Python
-
-
OS-Copilot
OS-Copilot PublicForked from OS-Copilot/OS-Copilot
An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.
Python
-
MobileAgent
MobileAgent PublicForked from X-PLUG/MobileAgent
Mobile-Agent: The Powerful GUI Agent Family
Python
-
webarena
webarena PublicForked from web-arena-x/webarena
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Python
Repositories
- webarena Public Forked from web-arena-x/webarena
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
AI-Agentic-Tools-Benchmarking/webarena’s past year of commit activity - cuga-agent Public Forked from cuga-project/cuga-agent
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.
AI-Agentic-Tools-Benchmarking/cuga-agent’s past year of commit activity - appworld Public Forked from StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.
AI-Agentic-Tools-Benchmarking/appworld’s past year of commit activity - tau2-bench Public Forked from sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
AI-Agentic-Tools-Benchmarking/tau2-bench’s past year of commit activity - open-interpreter Public Forked from openinterpreter/open-interpreter
A natural language interface for computers
AI-Agentic-Tools-Benchmarking/open-interpreter’s past year of commit activity - MaAS Public Forked from bingreeky/MaAS
[ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet
AI-Agentic-Tools-Benchmarking/MaAS’s past year of commit activity - ToolBench Public Forked from OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
AI-Agentic-Tools-Benchmarking/ToolBench’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…