🔗 This is the official implementation of SATORI.
To install dependencies:
conda env create -f environment.yamlWe use ms-swift to train the model. Install via:
pip install ms-swift -Uor:
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .Run the following command to start training:
MAX_PIXELS=401408 \
NPROC_PER_NODE=4 \
swift rlhf \
--rlhf_type grpo \
--model Qwen/Qwen2.5-VL-3B-Instruct \
--external_plugins ./plugin.py \
--reward_funcs external_format external_bbox_acc extern_vqa_acc external_caption_acc \
--use_vllm false \
--vllm_device auto \
--vllm_gpu_memory_utilization 0.6 \
--train_type full \
--torch_dtype bfloat16 \
--dataset <DATA_PATH> \
--max_length 2048 \
--max_completion_length 512 \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--learning_rate 1e-6 \
--gradient_accumulation_steps 2 \
--save_strategy 'steps' \
--eval_strategy 'steps' \
--eval_steps 200 \
--save_steps 200 \
--save_total_limit 2 \
--logging_steps 1 \
--output_dir output/GRPO \
--warmup_ratio 0.01 \
--dataloader_num_workers 4 \
--num_generations 16 \
--temperature 1.0 \
--top_p 0.9 \
--top_k 50 \
--system './prompt.txt' \
--deepspeed zero2 \
--log_completions true \
--vllm_max_model_len 1024 \
--num_iterations 1 \
--num_infer_workers 1 \
--async_generate false \
--beta 0.001 \Note:
external_formatformats the dataexternal_bbox_acccomputes bounding-box accuracyextern_vqa_acccomputes VQA accuracyexternal_caption_acccomputes caption accuracy (Seeplugin.pyfor details) 🔍
We provide an evaluation script in the VLMEvalKit directory. Make sure VLMEvalKit is in your working directory and update the dataset/model paths in config.py:
# File: ./VLMEvalKit/vlmeval/config.py (line 417)
cd VLMEvalKit
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun --nproc-per-node=4 run.py \
--data MMBench MMStar \
--model Qwen2.5-VL-3B-Instruct \
--verboseDownload here:
More model architechures and sizes will be released soon! 🔜
We release the VQA-Verify dataset here: link 🚀
This project adapts from ms-swift and VLMEvalKit. Thanks for their great work! 🙏
We welcome your issues, PRs, and feedback! Feel free to open an issue or submit a pull request. 🙌