Easy Reinforcement Learning for Diffusion and Flow-Matching Models
- [2026-02-01] Support for multiple Attention Backends! You can now optimize memory and speed by setting the
attn_backendparameter in your config:
model:
attn_backend: "flash" # Options: "native", "xformers", "flash_hub", "_flash_3_hub", "_flash_3_varlen_hub"This experimental feature leverages diffusers's transformer.set_attention_backend. Check the official diffusers documentation for all available options.
We recommend installing the
kernelspackage (pip install kernels) and usingflash_hub,flash_varlen_hub,_flash_3_hub, or_flash_3_varlen_hubto avoid the complexity and potential incompatibility of installing Flash-Attention directly.
- [2026-01-17] We have added the latest FLUX2-Klein series! Follow the commands to start:
# Clone the repo with submodule `diffusers`
git clone --recursive https://github.com/X-GenGroup/Flow-Factory.git
cd Flow-Factory
# Fetch the source code of `diffusers==0.37.0.dev`
git submodule update --init --recursive
# Install `diffusers==0.37.0.dev`
cd diffusers
pip install -e .
# Install Flow-Factory
cd ..
pip install -e .| Task | Model | Model Size | Model Type |
|---|---|---|---|
| Text-to-Image | FLUX.1-dev | 13B | flux1 |
| Z-Image-Turbo | 12B | z-image | |
| Qwen-Image | 20B | qwen-image | |
| Qwen-Image-2512 | 20B | qwen-image | |
| Image-to-Image | FLUX.1-Kontext-dev | 13B | flux1-kontext |
| Image(s)-to-Image | Qwen-Image-Edit-2509 | 20B | qwen-image-edit-plus |
| Qwen-Image-Edit-2511 | 20B | qwen-image-edit-plus | |
| Text-to-Image & Image(s)-to-Image | FLUX.2-dev | 30B | flux2 |
| FLUX.2-klein-4B | 4B | flux2-klein | |
| FLUX.2-klein-9B | 9B | flux2-klein | |
| FLUX.2-klein-base-4B | 4B | flux2-klein | |
| FLUX.2-klein-base-9B | 9B | flux2-klein | |
| Text-to-Video | Wan2.1-T2V-1.3B | 1.3B | wan2_t2v |
| Wan2.1-T2V-14B | 14B | wan2_t2v | |
| Wan2.2-TI2V-5B | 5B | wan2_t2v | |
| Wan2.2-T2V-A14B | A14B | wan2_t2v | |
| Image-to-Video | Wan2.1-I2V-14B-480P | 14B | wan2_i2v |
| Wan2.1-I2V-14B-480P | 14B | wan2_i2v | |
| Wan2.1-I2V-14B-720P | 14B | wan2_i2v | |
| Wan2.2-TI2V-5B | 5B | wan2_i2v | |
| Wan2.2-I2V-A14B | A14B | wan2_i2v |
To support new models, see Guidance/New Model.
| Algorithm | trainer_type |
|---|---|
| GRPO | grpo |
| GRPO-Guard | grpo-guard |
| DiffusionNFT | nft |
| AWM | awm |
See Algorithm Guidance for more information.
Model and algorithm are fully decoupled in Flow-Factory, enabling all listed modelβalgorithm combinations to work out of the box. The configurations under
examples/have been verified to yield measurable performance gains. For unlisted combinations, find the closest (task, algorithm) config and swap in the desired model or algorithm parameters.
git clone https://github.com/Jayce-Ping/Flow-Factory.git
cd Flow-Factory
pip install -e .Optional dependencies, such as deepspeed, are also available. Install them with:
pip install -e .[deepspeed]To use Weights & Biases or SwanLab to log experimental results, install extra dependencies via pip install -e .[wandb] or pip install -e .[swanlab].
After installation, set corresponding arguments in the config file:
run_name: null # Run name (auto: {model_type}_{finetune_type}_{trainer_type}_{timestamp})
project: "Flow-Factory" # Project name for logging
logging_backend: "wandb" # Options: wandb, swanlab, tensorboard, noneThese trackers allow you to visualize both training samples and metric curves online:
Start training with the following simple command:
ff-train examples/grpo/lora/flux1.yamlWe provide a set of guidance documents to help you understand the framework and extend it. For a comprehensive understanding of the framework's design and motivation, refer to our technique report.
| Document | Description |
|---|---|
| Workflow | End-to-end training pipeline: the overall stages from data preprocessing to policy optimization |
| Algorithms | Supported RL algorithms (GRPO, GRPO-Guard, DiffusionNFT, AWM) and their configurations |
| Rewards | Reward model system: built-in models, custom rewards, and remote reward servers |
| New Model | How to add support for a new Diffusion/Flow-Matching model |
The unified structure of dataset is:
|---- dataset
|----|--- train.txt / train.jsonl
|----|--- test.txt / test.jsonl (optional)
|----|--- images (optional)
|----|---| image1.png
|----|---| ...
|----|--- videos (optional)
|----|---| video1.mp4
|----|---| ...
For text-to-image and text-to-video tasks, the only required input is the prompt in plain text format. Use train.txt and test.txt (optional) with following format:
A hill in a sunset.
An astronaut riding a horse on Mars.
Example: dataset/pickscore
Each line represents a single text prompt. Alternatively, you can use train.jsonl and test.jsonl in the following format:
{"prompt": "A hill in a sunset."}
{"prompt": "An astronaut riding a horse on Mars."}Example: dataset/t2is
negative_prompt is also supported:
{"prompt": "A hill in a sunset.", "negative_prompt": "low quality, blurry, distorted, poorly drawn"}
{"prompt": "An astronaut riding a horse on Mars.", "negative_prompt": "low quality, blurry, distorted, poorly drawn"}Example: dataset/t2is_neg
For tasks involving conditioning images, use train.jsonl and test.jsonl in the following format:
{"prompt": "A hill in a sunset.", "image": "path/to/image1.png"}
{"prompt": "An astronaut riding a horse on Mars.", "image": "path/to/image2/png"}Example: dataset/sharegpt4o_image_mini
The default root directory for images is dataset_dir/images, and for videos, it is dataset_dir/videos. You can override these locations by setting the image_dir and video_dir variables in the config file:
data:
dataset_dir: "path/to/dataset"
image_dir: "path/to/image_dir" # (default to "{dataset_dir}/images")
video_dir: "path/to/video_dir" # (default to "{dataset_dir}/videos")For models like FLUX.2-dev and Qwen-Image-Edit-2511 that are able to accept multiple images as conditions, use the images key with a list of image paths:
{"prompt": "A hill in a sunset.", "images": ["path/to/condition_image_1_1.png", "path/to/condition_image_1_2.png"]}
{"prompt": "An astronaut riding a horse on Mars.", "images": ["path/to/condition_image_2_1.png", "path/to/condition_image_2_2.png"]}{"prompt": "A hill in a sunset.", "video": "path/to/video1.mp4"}
{"prompt": "An astronaut riding a horse on Mars.", "videos": ["path/to/video2.mp4", "path/to/video3.mp4"]}Flow-Factory provides a flexible reward model system that supports both built-in and custom reward models for reinforcement learning.
Flow-Factory supports two types of reward models:
- Pointwise Reward: Computes independent scores for each sample (e.g., aesthetic quality, text-image alignment).
- Pairwise Reward: Computes rewards based on the pairwise comparison within the group. This is a special case of the following Groupwise Reward.
- Groupwise Reward: Computes rewards that requires the all samples in a group (e.g., ranking-based score or pairwise comparison).
The following reward models are pre-registered and ready to use:
| Name | Type | Description | Reference |
|---|---|---|---|
PickScore |
Pointwise | CLIP-based aesthetic scoring model | PickScore |
PickScore_Rank |
Groupwise | Ranking-based reward using PickScore | PickScore |
CLIP |
Pointwise | Image-text cosine similarity | CLIP |
Simply specify the reward model name in your config file:
rewards:
name: "aesthetic" # Alias for this reward model
reward_model: "PickScore" # Reward model type or a path like 'my_package.rewards.CustomReward'
batch_size: 16
device: "cuda"
dtype: bfloat16Refer to Rewards Guidance for more information about advanced usage, such as creating a custom reward model.
This repository is based on diffusers, accelerate and peft. We thank them for their contributions to the community!!!
If you find Flow-Factory useful in your research, please consider citing our paper:
@article{ping2026flowfactory,
title={Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models},
author={Bowen Ping and Chengyou Jia and Minnan Luo and Hangwei Qian and Ivor Tsang},
journal={arXiv preprint arXiv:2602.12529},
year={2026},
url={https://arxiv.org/abs/2602.12529},
}