Official repository for the paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation".
- [2026.02.28] We release the training code! ! 🔥
- [2026.02.21] AR3D-R1 has been accepted by CVPR 2026 Main ! 🔥
- [2025.12.15] AR3D-R1 #3 paper of the day in HuggingFace Daily Papers ! 🔥
- [2025.12.11] We release the checkpoint of one-step AR3D-R1 and the inference code! 🔥
- [2025.12.11] We release the arxiv paper. 🔥
Please set up the Python environment by:
conda env create -f environment.yml
conda activate environment_name
pip install -r requirements.txt
My environment setup is mainly based on ShapeLLM-Omni. If you only need inference, installing this repository is sufficient.
Please download the reward model you need for training.
cd gen3d-r1/reward_weight- Download HPS checkpoint from this link by
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt- Download Unified checkpoint from this link by
huggingface-cli download CodeGoat24/UnifiedReward-2.0-qwen-7b --repo-type model --local-dir UnifiedReward-2.0-qwen-7bcd gen3d-r1/src
bash scripts/run_grpo_3d.shNotes:
- Parameters:
- reward_funcs: The options are
hps,unified. You can choose whatever composition you need for training. Make sure to substitute the correct checkpoint path and config path in therun_grpo_3d.sh
- reward_funcs: The options are
You can download the checkpoint from here
python demo.py
We provide an evaluation script that supports both inference and metrics calculation:
python eval.py
Configuration:
- Modify the
model_pathineval.pyto point to your downloaded checkpoint - The script is compatible with the inference pipeline and adds comprehensive metrics evaluation
- Supports batch evaluation on test datasets with automatic metric computation
-
Release complete two-step training & evaluation code
-
Release one-step training code
- [Image Generation CoT] Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step?
- [T2I-R1] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
- [ShapeLLM-Omni] ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding
- [Trellis] Structured 3D Latents for Scalable and Versatile 3D Generation
If you find AR3D-R1 useful for your research or projects, we would greatly appreciate it if you could cite the following paper:
@article{tang2025we,
title={Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation},
author={Tang, Yiwen and Guo, Zoey and Zhu, Kaixin and Zhang, Ray and Chen, Qizhi and Jiang, Dongzhi and Liu, Junli and Zeng, Bohan and Song, Haoming and Qu, Delin and others},
journal={arXiv preprint arXiv:2512.10949},
year={2025}
}


