Stars
verl: Volcano Engine Reinforcement Learning for LLMs
Official Repository of paper: "MotionEdit: Benchmarking and Learning Motion-Centric Image Editing"
The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).
VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking
The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"
A version of verl to support diverse tool use
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
Pioneering Automated GUI Interaction with Native Agents
Building a comprehensive and handy list of papers for GUI agents
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.
The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"
Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.
A curated list of awesome instruction tuning datasets, models, papers and repositories.
Code for the ACL 2024 paper "PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning"
[ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?
Enhancing AI Software Engineering with Repository-level Code Graph
DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems
Examples and guides for using the OpenAI API
A high-throughput and memory-efficient inference and serving engine for LLMs
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"