Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,727 150 Updated Oct 9, 2025

tanghaom / AppEvalPilot

Python 76 11 Updated Oct 13, 2025

baidubce / Qianfan-VL

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models

163 11 Updated Sep 22, 2025

MinorJerry / WebVoyager

Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"

Python 936 104 Updated Mar 4, 2024

X-PLUG / MobileAgent

Mobile-Agent: The Powerful GUI Agent Family

Python 6,104 608 Updated Oct 17, 2025

ddupont808 / GPT-4V-Act

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 1,055 101 Updated Dec 9, 2024

stackblitz / bolt.new

Prompt, run, edit, and deploy full-stack web applications. -- bolt.new -- Help Center: https://support.bolt.new/ -- Community Support: https://discord.com/invite/stackblitz

TypeScript 15,853 14,221 Updated Dec 17, 2024

mnluzimu / WebGen-Bench

Python 27 2 Updated Aug 31, 2025

MathFoundationRL / Book-Mathematical-Foundation-of-Reinforcement-Learning

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 12,486 1,192 Updated Oct 12, 2025

njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick

HTML 433 22 Updated Jul 13, 2025

google-research-datasets / screen_qa

ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K…

Python 129 9 Updated Feb 7, 2025

Khang-9966 / Computer-Browser-Phone-Use-Agent-Datasets

This repository hosts a collection of datasets for training and evaluating CUA / GUI agents.

72 5 Updated Jul 27, 2025

inclusionAI / UI-Venus

UI-Venus is a native UI agent designed to perform precise GUI element grounding and effective navigation using only screenshots as input.

Python 492 36 Updated Aug 25, 2025

OpenGVLab / GUI-Odyssey

[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app…

Python 130 8 Updated Aug 4, 2025

We-Math / We-Math2.0

The code and data of We-Math 2.0.

Python 159 8 Updated Aug 30, 2025

google-deepmind / alphageometry

Python 4,673 549 Updated Jun 19, 2025

dle666 / R-CoT

Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Python 180 7 Updated Nov 4, 2024

facebookresearch / dinov3

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 7,895 518 Updated Oct 22, 2025

PatrikBak / GeoGen

Automated generation of planar geometry olympiad problems

C# 98 21 Updated Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haoran Wang LightersWang

Achievements

Achievements

Block or report LightersWang

Stars

deepseek-ai / DeepSeek-OCR

openai / mle-bench

karpathy / nanochat

openai / openai-agents-python

mnluzimu / WebGen-Agent

Tencent-Hunyuan / ArtifactsBenchmark

asweigart / pyautogui

browser-use / browser-use

Halluminate / WebBench

longzhaohuang / GUI-Agent-Survey

TianyiPeng / WebProber

QwenLM / Qwen3-Omni