ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

💐 Paper accepted at NeurIPS 2025

Tonghe Zhang$^1$, Chao Yu$^{2,3}$, Sichang Su$^4$, Yu Wang$^2$

$^1$ Carnegie Mellon University $^2$ Tsinghua University $^3$ Beijing Zhongguancun Academy $^4$ University of Texas at Austin

This is the official implementation of "ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning".

If you like our work, we'll be happy if you give us a star ⭐!

🔥 ReinFlow can now scale to fine-tune 3B VLA models like $\pi_0$ via massively parallel RL.

🚀 About ReinFlow

ReinFlow is a flexible policy gradient framework for fine-tuning flow matching policies at any denoising step.

How does it work?
👉 First, train flow policies using imitation learning (behavior cloning).
👉 Then, fine-tune them with online reinforcement learning using ReinFlow!

🧩 Supports:

✅ 1-Rectified Flow
✅ Shortcut Models
✅ Any other policy defined by ODEs (in principle)

📈 Empirical Results: ReinFlow achieves strong performance across a variety of robotic tasks:

🦵 Legged Locomotion (OpenAI Gym)
✋ State-based manipulation (Franka Kitchen)
👀 Visual manipulation (Robomimic)

🧠 Key Innovation: ReinFlow trains a noise injection network end-to-end:

✅ Makes policy probabilities tractable, even with very few denoising steps (e.g., 4, 2, or 1)
✅ Robust to discretization and Monte Carlo approximation errors

Learn more on our 🔗 project website or check out the arXiv paper.

📢 News

[2025/10/16] We scaled up ReinFlow to fine-tune large VLA like $\pi_0$. Code, hyperparamters in LIBERO environment released at RLinf.
[2025/09/18] Paper accepted at NeurIPS 2025.
[2025/08/18] All training metrics (losses, reward, etc) released in WandB to help you reproduce our results.
[2025/07/30] Fixed the rendering bug in Robomimic. Now supports rendering at 1080p resolution.
[2025/07/29] Add tutorial on how to record videos during evaluation in the docs
[2025/06/14] Updated webpage for a detailed explanation to the algorithm design.
[2025/05/28] Paper is posted on arXiv!

🚀 Installation

Please follow the steps in installation/reinflow-setup.md.

🚀 Quick Start: Reproduce Our Results

To fully reproduce our experiments, please refer to ReproduceExps.md.

To download our training data and reproduce the plots in the paper, please refer to ReproduceFigs.md.

🚀 Implementation Details

Please refer to Implement.md for descriptions of key hyperparameters of FQL, DPPO, and ReinFlow.

🚀 Adding Your Own Dataset or Environment

Please refer to Custom.md.

🚀 Debug Aid and Known Issues

Please refer to KnownIssues.md to see how to resolve errors you encounter.

⭐ Todo

License

This repository is released under the MIT license. See LICENSE. If you use our code, we appreciate it if you paste the license at the beginning of the script.

Acknowledgement

This repository was developed from multiple open-source projects. Major references include:

TorchCFM, Tong et al.: Conditional flow-matching repository.
Shortcut Models, Francs et al.: One-step Diffusion via Shortcut Models.
DPPO, Ren et al.: DPPO official implementation.

For more references, please refer to Acknowledgement.md.

Cite our work

@misc{zhang2025reinflowfinetuningflowmatching,
    title={ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning},
    author={Tonghe Zhang and Chao Yu and Sichang Su and Yu Wang},
    year={2025},
    eprint={2505.22094},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2505.22094},
}

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
agent		agent
cfg		cfg
data_process		data_process
docs		docs
env		env
installation		installation
model		model
sample_figs		sample_figs
script		script
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

💐 Paper accepted at NeurIPS 2025

🚀 About ReinFlow

📢 News

🚀 Installation

🚀 Quick Start: Reproduce Our Results

🚀 Implementation Details

🚀 Adding Your Own Dataset or Environment

🚀 Debug Aid and Known Issues

⭐ Todo

License

Acknowledgement

Cite our work

Star History

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Uh oh!

License

Uh oh!

ReinFlow/ReinFlow

Folders and files

Latest commit

History

Repository files navigation

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

💐 Paper accepted at NeurIPS 2025

🚀 About ReinFlow

📢 News

🚀 Installation

🚀 Quick Start: Reproduce Our Results

🚀 Implementation Details

🚀 Adding Your Own Dataset or Environment

🚀 Debug Aid and Known Issues

⭐ Todo

License

Acknowledgement

Cite our work

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages