Stars
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
xuetf / RecSys-Challenge-2024-2nd-Solution
Forked from BlackPearl-Lab/RecSys-Challenge-2024-2nd-SolutionR-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Fully Open Framework for Democratized Multimodal Training
R-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.
ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).
This is the official code base of AgentNetTool in OpenCUA. Website: https://opencua.xlang.ai/
OpenCUA: Open Foundations for Computer-Use Agents
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Community maintained hardware plugin for vLLM on Ascend
A curated collection of resources, tools, and frameworks for developing GUI Agents.
UI-Venus is a native UI agent designed to perform precise GUI element grounding and effective navigation using only screenshots as input.
Solve Visual Understanding with Reinforced VLMs
[AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Renderer for the harmony response format to be used with gpt-oss
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
RM-R1: Unleashing the Reasoning Potential of Reward Models
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent