Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Code repository for "Post-pre-training for Modality Alignment in Vision-Language Foundation Models" (CVPR2025)

License

yshinya6/clip-refine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Requirements

Software Requirements

  • CUDA >= 12.3

Python Requirements

  • Please see apptainer/config.def

Preparations

Post-pre-training Dataset: COCO Caption (2017)

  1. Download the dataset from here
  2. Install the dataset into ./dataset/coco/

Evaluation Dataset: ImageNet

  1. Download the dataset from here
  2. Install the dataset into ./dataset/imagenet/

Example

Run Post-pre-training of CLIP-Refine on COCO Caption

python3 main/train.py --config_path config/01_post-pre-training/clip-refine.yaml

Evaluate Zero-shot Performance on ImageNet

python3 main/test.py --config_path config/01_post-pre-training/clip-refine.yaml

Citation

@inproceedings{Yamaguchi_CVPR25_CLIP-Refine,
  title={Post-pre-training for Modality Alignment in Vision-Language Foundation Models},
  author={Yamaguchi, Shin'ya and Feng, Dewei and Kanai, Sekitoshi and Adachi, Kazuki and Chijiwa, Daiki},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

About

Code repository for "Post-pre-training for Modality Alignment in Vision-Language Foundation Models" (CVPR2025)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages