Thanks to visit codestin.com
Credit goes to github.com

Skip to content

heliossun/LaCoT

Repository files navigation

LaCoT: Latent Chain-of-Thought for Visual Reasoning

Static Badge LaCoT-demo

LaCoT-checkpoints LaCoT-checkpoints

SFT-checkpoints SFT-checkpoints

Release Notes

  • [2025/10/15] 🔥 We release the training code and training data for SFT and GFN
  • [2025/09/14] 🔥 Our paper LaCoT is accepted by NeurIPS 2025

Data Preparation

Note:

  1. Download LLaVA-CoT in folder cot.
  2. Download R1-Onevision in folder cot/r1ov-image

The final data path should look like this:

cot
├── ai2d
├── chartqa
├── CLEVR_v1.0
├── coco
├── docvqa
├── geoqa+
├── gqa
├── llava
├── ocr_vqa
├── pisc
├── r1ov-image
├── sam
├── share_textvqa
├── sqa
├── textvqa
├── vg
├── web-celebrity
├── web-landmark
└── wikiart

Models & Scripts

Installation

1. Clone this repository and navigate to the LLaVA folder:

git clone https://github.com/heliossun/LaCoT.git

cd  LaCoT

2. Install the inference package:

conda  create  -n  qwen  python=3.10  -y
conda  activate  qwen

### if ImportError: /lib64/libc.so.6: version `GLIBC_2.32' not found
pip  install  torch==2.6.0  torchvision==0.21.0  torchaudio==2.6.0  --index-url  https://download.pytorch.org/whl/cu124
pip  install  flash-attn==2.7.4.post1  --no-build-isolation
pip  install  git+https://github.com/huggingface/transformers  accelerate
pip  install  qwen-vl-utils[decord]
## Install required packages
pip  install  deepspeed
pip  install  peft
pip  install  ujson
pip  install  liger_kernel
pip  install  dataset
pip  install  torchvision
pip  install  wandb
# use transformers==4.51.3 for training

Training

Stage1 SFT: You may follow training code

Stage2 GFN: You may follow training code You may adjust the following hyperparameters in the training script

--explore_nums 6 \ # number of exploration
--explore_min_bs 2 \ # batch size for exploration
--rat_max_len 1024 \ # explored rational's max sequence length
--rat_min_len 64 \
--reward_tolarent_start 1.5 \ # higher means accepting low reward exploration during policy gradient
--reward_tolarent_end 1 \
--reward_tolarent_horizon 50 \ # warmup steps

Evaluation

We implement our model card in lmms-eval for evaluation. After installation, please check the scripts in models for more detail.

Citation

If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:

About

[NeurIPS 2025] Official code for paper: Latent Chain-of-Thought for Visual Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published