Thanks to visit codestin.com
Credit goes to github.com

Skip to content

linjh1118/EasyR1

 
 

Repository files navigation

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

GitHub Repo stars Twitter

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

  • Supported models

    • Llama3/Qwen2/Qwen2.5/Qwen3 language models
    • Qwen2/Qwen2.5-VL vision language models
    • DeepSeek-R1 distill models
  • Supported algorithms

    • GRPO
    • DAPO
    • Reinforce++
    • ReMax
    • RLOO
  • Supported datasets

  • Supported tricks

    • Padding-free training
    • Resuming from checkpoint
    • Wandb & SwanLab & Mlflow & Tensorboard tracking

Requirements

Software Requirements

  • Python 3.9+
  • transformers>=4.51.0
  • flash-attn>=2.4.3
  • vllm>=0.8.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.7.0-cu12.6-vllm0.9.1

Hardware Requirements

* estimated

Method Bits 1.5B 3B 7B 32B 72B
GRPO Full Fine-Tuning AMP 2*24GB 4*40GB 8*40GB 16*80GB 32*80GB
GRPO Full Fine-Tuning BF16 1*24GB 1*40GB 4*40GB 8*80GB 16*80GB

Note

Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.

We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

image

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Training

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

Tip

If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

How to Understand GRPO in EasyR1

image

How to Run 70B+ Model in Multi-node Environment

  1. Start the Ray head node.
ray start --head --port=6379 --dashboard-host=0.0.0.0
  1. Start the Ray worker node and connect to the head node.
ray start --address=<head_node_ip>:6379
  1. Check the Ray resource pool.
ray status
  1. Run training script on the Ray head node only.
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

See the veRL's official doc for more details about multi-node training and Ray debugger.

Other Baselines

We also reproduced the following two baselines of the R1-V project.

  • CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
  • GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

Performance Baselines

See baselines.md.

Awesome Work using EasyR1

  • MMR1: Advancing the Frontiers of Multimodal Reasoning. [code]
  • Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models. [code] [arxiv]
  • Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. [code] [arxiv]
  • MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse. [code] [arxiv]
  • Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward. [code]
  • NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation. [code] [arxiv]
  • GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents. [code] [arxiv]
  • R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning. [code]
  • VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning. [code] [arxiv]
  • MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO. [code] [arxiv]
  • RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start. [code] [arxiv]
  • ViGoRL: Grounded Reinforcement Learning for Visual Reasoning. [code] [arxiv]
  • Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning. [code] [arxiv]
  • SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward. [code] [arxiv]
  • Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning. [code] [arxiv]

TODO

  • Support LoRA (high priority).
  • Support ulysses parallelism for VLMs (middle priority).
  • Support more VLM architectures.

Note

We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using LLaMA-Factory.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

  • Vision language models are not compatible with ulysses parallelism yet.

Discussion Group

👋 Join our WeChat group.

FAQs

ValueError: Image features and image tokens do not match: tokens: 8192, features 9800

Increase the data.max_prompt_length or reduce the data.max_pixels.

RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62

Reduce the worker.rollout.gpu_memory_utilization and enable worker.actor.offload.offload_params.

RuntimeError: 0 active drivers ([]). There should only be one.

Uninstall deepspeed from the current python environment.

Citation

Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

About

Come on!

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.0%
  • Shell 2.2%
  • Other 0.8%