Stars
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, includi…
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
一款在线图像标注工具(矩形、多边形、持续更新中……),可用于深度学习实例分割模型训练(Mask R-CNN)等。
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, loca…
The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of the screen2words models (our paper accepted by UIST'21 will …
Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
Tool Learning for Big Models, Open-Source Solutions of ChatGPT-Plugins
the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"
Paper collection on building and evaluating language model agents via executable language grounding
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Llama中文社区,实时汇总最新Llama学习资料,构建最好的中文Llama大模型开源生态,完全开源可商用
Making large AI models cheaper, faster and more accessible
Llama2开源模型中文版-全方位测评,基于SuperCLUE的OPEN基准 | Llama2 Chinese evaluation with SuperCLUE
OpenXRLab Multi-view Motion Capture Toolbox and Benchmark
[ECCV 2022] Official implementation of Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
Official implementation of "VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment"
Crowdsourcing pipeline and website for OpenSurfaces [SIG '13] and Intrinsic Images in the Wild [SIG '14]
Stable Diffusion web UI