- Shanghai, China
Stars
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
A lightweight, powerful framework for multi-agent workflows
A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
📚 Benchmark your browser agent on ~2.5k READ and ACTION based tasks
The official GitHub repository for the paper "GA: A Comprehensive Survey on LLM-based GUI Agent"
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
Mobile-Agent: The Powerful GUI Agent Family
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
Prompt, run, edit, and deploy full-stack web applications. -- bolt.new -- Help Center: https://support.bolt.new/ -- Community Support: https://discord.com/invite/stackblitz
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
The model, data and code for the visual GUI Agent SeeClick
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K…
This repository hosts a collection of datasets for training and evaluating CUA / GUI agents.
UI-Venus is a native UI agent designed to perform precise GUI element grounding and effective navigation using only screenshots as input.
[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app…
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Reference PyTorch implementation and models for DINOv3
Automated generation of planar geometry olympiad problems