Thanks to visit codestin.com
Credit goes to github.com

Skip to content

yejy53/RealGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Paper PDF

📰 News

  • [2025.12.02] 🔥 We have released RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards. Check out the [ Paper; ].

fig1

🏆 Contributions

  • What did we do? We propose RealGen, a text-to-image generator capable of producing highly convincing photorealistic images. It leverages a Detector Reward-guided GRPO post-training to escape detector identification, thereby reducing artifacts and enhancing image realism and detail.
  • 📐 How to evaluate performance? We introduce RealBench, a new benchmark for evaluating photorealism that achieves human-free automated scoring through Detector-Scoring and Arena-Scoring.
  • 🔧 How effective was it? RealGen significantly outperforms both general image models (like GPT-Image-1, Qwen-Image) and specialized realistic models (like FLUX-Krea) in realism, details, and aesthetics on the T2I task.

fig2

🤝 Concurrent Work

We are pleased to find that the strategy of utilizing AIGC detectors as reward signals has been independently explored by other excellent concurrent works. We acknowledge and recommend checking out:

  • LongCat-Image: They innovatively incorporate an AIGC detection model as a reward during the RL phase, utilizing adversarial signals to guide the model toward generating images with the texture and fidelity of the real physical world.
  • Z-Image: In their RLHF pipeline, they design a comprehensive reward model where AI-Content Detection perception serves as a critical dimension, alongside instruction-following capability and aesthetic quality.

It is exciting to see the community converging on this effective paradigm to bridge the gap between generated and real distributions.

✨ Comparison

fig1

🚀 Quick Started

  • It should be stated that our proposed detection-for-generation framework is compatible with all diffusion-model-based GRPO paradigms, such Dance GRPO and Flow GRPO.

1. Environment Set Up

Diffusion model Training Framework Based on Flow GRPO:Environment Configuration Reference Flow GRPO

cd /RealGen/flow_grpo
conda create -n flow_grpo python=3.10.16
pip install -e .

2. Model Download

Please download the required models in advance.

  • T2I Models:
    • FLux: black-forest-labs/FLUX.1-dev
    • SD: stabilityai/stable-diffusion-3.5-large
    • Other diffusion models
  • Reward Models:
    • Detection Model: Forensic-chat and OmniAID or other Fake detection models
    • Alignment Model: Longclip, clip or other alignment models

3. Reward Preparation

The steps above strictly cover the installation of the core repository. Given that different reward models often depend on conflicting library versions, merging them into a single Conda environment can lead to compatibility issues. To mitigate this, please create a new Conda virtual environment and install the corresponding dependencies according to the instructions in Reward Server

cd /RealGen/flow_grpo/reward-server
conda create -n reward_server python=3.10.16
conda activate reward_server
pip install -e .

We trained task-specific detectors to serve as reward model based on an existing fake detection models. To clarify, we found that reward hacking occurs easily during GRPO training. The existing detection models tends to give high scores to noisy or blurry images. For this reason, we retrained OmniAID to make it suitable for our task:

  • Semantic Detector: Forensic-Chat, a generalizable and interpretable detector optimized from Qwen2.5-VL-7B. It assesses authenticity by analyzing image content (e.g., smooth greasy skin, artifacts in faces/hands, unnatural background blur).
  • Feature Detector: OmniAID achieves stable and accurate detection by being pre-trained on large-scale real and synthetic datasets. Feature-level artifacts are primarily associated with frequency artifacts and abnormal noise patterns.

An 8-GPU H200 training node was employed for this study, with seven GPUs allocated for the GRPO training process and one GPU reserved for hosting the reward server. Reference code for running the service:

CUDA_VISIBLE_DEVICES=7 nohup gunicorn --workers 1 --bind 127.0.0.1:18085 "app_forensic_chat:create_app()" > reward_forensic_chat.log 2>&1 &
CUDA_VISIBLE_DEVICES=7 nohup gunicorn --workers 1 --bind 127.0.0.1:18087 "app_omniaid:create_app()" > reward_omniaid.log 2>&1 &
CUDA_VISIBLE_DEVICES=7 nohup gunicorn --workers 1 --bind 127.0.0.1:18089 "app_longclip:create_app()" > reward_longclip.log 2>&1 &

4. Start Training GRPO

Model parameter settings are located in /RealGen/flow_grpo/config, while the main files and training settings are in /RealGen/flow_grpo/scripts. Notably, we have also updated GRPO-Guard to improve the capability of generating high-quality images. Below is a reference for running a selected model:

cd /RealGen/flow_grpo
conda activate flow_grpo
bash scripts/single_node/fast_grpo_flux_guard.sh

Additionally, if there are no environmental conflicts and GPU memory is sufficient, the reward function does not need to be deployed as a separate service. It can be modified directly in /RealGen/flow_grpo/flow_grpo/rewards.py. You may also refer to Flow GRPO.

The dataset is located in /RealGen/flow_grpo/dataset/realgen. The training set contains short prompts and their rewritten long captions covering multiple topics, such as people, animals, and architecture.

5. Evaluation

The inference and evaluation processes are realized according to the code in /RealGen/eval.

🤗 Acknowledgement

This repo is based on Flow GRPO. We thank the authors for their valuable contributions to the AlGC community.

📕 BibTeX

@article{ye2025realgen,
  title={RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards},
  author={Ye, Junyan and Zhu, Leqi and Guo, Yuncheng and Jiang, Dongzhi and Huang, Zilong and Zhang, Yifan and Yan, Zhiyuan and Fu, Haohuan and He, Conghui and Li, Weijia},
  journal={arXiv preprint arXiv:2512.00473},
  year={2025}
}

About

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published