LaCoT: Latent Chain-of-Thought for Visual Reasoning

Release Notes

[2025/10/15] 🔥 We release the training code and training data for SFT and GFN
[2025/09/14] 🔥 Our paper LaCoT is accepted by NeurIPS 2025

Data Preparation

Stage-1 SFT Dataset: Download the dataset.
Stage-2 RL Dataset: Download the dataset.
Prepare the raw images following: LLaVA-CoT and R1-Onevision (you may also follow our script to prepare R1-Onevision data).

Note:

Download LLaVA-CoT in folder cot.
Download R1-Onevision in folder cot/r1ov-image

The final data path should look like this:

cot
├── ai2d
├── chartqa
├── CLEVR_v1.0
├── coco
├── docvqa
├── geoqa+
├── gqa
├── llava
├── ocr_vqa
├── pisc
├── r1ov-image
├── sam
├── share_textvqa
├── sqa
├── textvqa
├── vg
├── web-celebrity
├── web-landmark
└── wikiart

Models & Scripts

Installation

1. Clone this repository and navigate to the LLaVA folder:

git clone https://github.com/heliossun/LaCoT.git

cd  LaCoT

2. Install the inference package:

conda  create  -n  qwen  python=3.10  -y
conda  activate  qwen

### if ImportError: /lib64/libc.so.6: version `GLIBC_2.32' not found
pip  install  torch==2.6.0  torchvision==0.21.0  torchaudio==2.6.0  --index-url  https://download.pytorch.org/whl/cu124
pip  install  flash-attn==2.7.4.post1  --no-build-isolation
pip  install  git+https://github.com/huggingface/transformers  accelerate
pip  install  qwen-vl-utils[decord]
## Install required packages
pip  install  deepspeed
pip  install  peft
pip  install  ujson
pip  install  liger_kernel
pip  install  dataset
pip  install  torchvision
pip  install  wandb
# use transformers==4.51.3 for training

Training

Stage1 SFT: You may follow training code

Stage2 GFN: You may follow training code You may adjust the following hyperparameters in the training script

--explore_nums 6 \ # number of exploration
--explore_min_bs 2 \ # batch size for exploration
--rat_max_len 1024 \ # explored rational's max sequence length
--rat_min_len 64 \
--reward_tolarent_start 1.5 \ # higher means accepting low reward exploration during policy gradient
--reward_tolarent_end 1 \
--reward_tolarent_horizon 50 \ # warmup steps

Evaluation

We implement our model card in lmms-eval for evaluation. After installation, please check the scripts in models for more detail.

Citation

If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
lmms-eval		lmms-eval
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
get_r1_ov_data.py		get_r1_ov_data.py
prepare_env.sh		prepare_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LaCoT: Latent Chain-of-Thought for Visual Reasoning

Release Notes

Data Preparation

Models & Scripts

Installation

1. Clone this repository and navigate to the LLaVA folder:

2. Install the inference package:

Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

heliossun/LaCoT

Folders and files

Latest commit

History

Repository files navigation

LaCoT: Latent Chain-of-Thought for Visual Reasoning

Release Notes

Data Preparation

Models & Scripts

Installation

1. Clone this repository and navigate to the LLaVA folder:

2. Install the inference package:

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages