We are ByteDance Seed team.
We are delighted to introduce FlowRL. It is a new approach for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. This creates a promising framework that integrates generative policies with reinforcement learning.
- [2025/06/10] 🔥 We release the PyTorch version of the code.
- [2025/09/18] 🎉 Our paper has been accepted to NeurIPS 2025.
FlowRL is an Actor-Critic framework that leverages flow-based policy representation and integrates Wasserstein-2-regularized optimization. By implicitly constraining the current policy to the optimal behavioral policy via W2 distance, FlowRL achieves superior performance on challenging benchmarks like the DM_Control (Dog domain, Humanoid domain) and Humanoid_Bench.
-
Setup Conda Environment: Create an environment with
conda create -n flowrl python=3.11
-
Clone this Repository:
git clone https://github.com/bytedance/FlowRL.git cd FlowRL -
Install FlowRL Dependencies:
pip install -r requirements.txt
-
Training Examples:
-
Run a single training instance:
python3 main.py --domain dog --task run
-
Run parallel training:
bash scripts/train_parallel.sh
-
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
- Release JAX version source code
If you find FlowRL useful for your research and applications, please consider giving us a star ⭐ or cite us using:
@article{lv2025flow,
title={Flow-Based Policy for Online Reinforcement Learning},
author={Lv, Lei and Li, Yunfei and Luo, Yu and Sun, Fuchun and Kong, Tao and Xu, Jiafeng and Ma, Xiao},
journal={arXiv preprint arXiv:2506.12811},
year={2025}
}About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.