- [2025/10/15] 🔥 We release the training code and training data for SFT and GFN
- [2025/09/14] 🔥 Our paper LaCoT is accepted by NeurIPS 2025
- Stage-1 SFT Dataset: Download the dataset.
- Stage-2 RL Dataset: Download the dataset.
- Prepare the raw images following: LLaVA-CoT and R1-Onevision (you may also follow our script to prepare R1-Onevision data).
Note:
- Download LLaVA-CoT in folder cot.
- Download R1-Onevision in folder cot/r1ov-image
The final data path should look like this:
cot
├── ai2d
├── chartqa
├── CLEVR_v1.0
├── coco
├── docvqa
├── geoqa+
├── gqa
├── llava
├── ocr_vqa
├── pisc
├── r1ov-image
├── sam
├── share_textvqa
├── sqa
├── textvqa
├── vg
├── web-celebrity
├── web-landmark
└── wikiart
git clone https://github.com/heliossun/LaCoT.git
cd LaCoTconda create -n qwen python=3.10 -y
conda activate qwen
### if ImportError: /lib64/libc.so.6: version `GLIBC_2.32' not found
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install git+https://github.com/huggingface/transformers accelerate
pip install qwen-vl-utils[decord]
## Install required packages
pip install deepspeed
pip install peft
pip install ujson
pip install liger_kernel
pip install dataset
pip install torchvision
pip install wandb
# use transformers==4.51.3 for trainingStage1 SFT: You may follow training code
Stage2 GFN: You may follow training code You may adjust the following hyperparameters in the training script
--explore_nums 6 \ # number of exploration
--explore_min_bs 2 \ # batch size for exploration
--rat_max_len 1024 \ # explored rational's max sequence length
--rat_min_len 64 \
--reward_tolarent_start 1.5 \ # higher means accepting low reward exploration during policy gradient
--reward_tolarent_end 1 \
--reward_tolarent_horizon 50 \ # warmup steps
We implement our model card in lmms-eval for evaluation. After installation, please check the scripts in models for more detail.
If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX: