Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Osilly/EasyR1

 
 

Repository files navigation

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

  • Supported models

    • Llama3/Qwen2/Qwen2.5 language models
    • Qwen2/Qwen2.5-VL vision language models
    • DeepSeek-R1 distill models
  • Supported algorithms

    • GRPO
    • Remax
    • others RL algorithms (comming soon)
  • Supported datasets

  • Supported tricks

    • Padding-free training
    • Resuming from checkpoint
    • Wandb & SwanLab tracking

Requirements

Software Requirements

  • Python 3.9+
  • transformers>=4.49.0
  • flash-attn>=2.4.3
  • vllm>=0.7.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.5.1-cu120-vllm0.7.4-hotfix

Hardware Requirements

* estimated

Method Bits 1.5B 3B 7B
GRPO Full Fine-Tuning AMP 2*24GB 4*40GB 8*40GB

Note

At least 2 GPUs are needed to run EasyR1.

We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

image

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Training

bash examples/run_qwen2_5_vl_7b_geo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint

Tip

If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

Tip

EasyR1 already supports multi-image dataset.

How to Understand GRPO in EasyR1

image

  • To learn about the GRPO algorithm, you can refer to Hugging Face's blog.
  • Different from TRL's GRPO trainer, our trainer supports mini-batch update as described in the original PPO paper.

Other Baselines

We also implemented the following two baselines from R1-V project.

  • CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
  • GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

Awesome Work using EasyR1

  • MMR1: Advancing the Frontiers of Multimodal Reasoning (repo).
  • Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models (paper, repo).

TODO

  • Support PPO, Reinforce++ and RLOO for VLMs.
  • Support ulysses parallelism for VLMs.
  • Support more VLM architectures.

Note

We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using LLaMA-Factory.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

  • Vision language models are not compatible with ulysses parallelism yet.

Discussion Group

👋 Join our WeChat group.

Citation

Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

About

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Other 0.8%