This repository contains the code for our CVPR 2025 paper.
- Chris Dongjoo Kim*, Jihwan Moon*, Sangwoo Moon, Heeseung Yun, Sihaeng Lee, Aniruddha Kembhavi, Soonyoung Lee, Gunhee kim, Sangho Lee, Christopher Clark. ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams. In CVPR, 2025 (* equal contribution).
- Python >= 3.9
- CUDA >= 9.0 supported GPU
Using virtual env is recommended.
# create conda env with python=3.9
conda create -n {ENV_NAME} python=3.9
conda activate {ENV_NAME}
# install other packages
pip install -r requirements.txtcreate checkpoints and data directories.
$ mkdir -p data/videocc3m
$ mkdir -p data/webvid2mPerform extraction referring to the LB_feature_extraction directory.
For each task (e.g., msrvtt, didemo, lsmdc, youcook, activitynet) , run command below to save von Mises-Fisher (VMF) kernel density estimator (KDE)
$ python -m utils.vmf_kde_each_task --task {task} --save_path {per task vmf kde path} --embedding_path {save_path from LB_feature_extraction}Then, create symbolic link to
$ ln -s [per task vmf kde path] data/per_task_vmf_kdePerform ReSpec filtering:
threshold uses normalized values between [0,1].
$ python main.py -c=configs/{cc3m/webvid}_respec.yaml -l=checkpoints/{checkpoint directory} --override="model_name=respec|threshold={sim_thr}"Obtain the path to the saved clean_meta_real_text_sim.pkl file containing the filtered meta data, and train on BT-adapter.
The contents of this repo are free to use for academic purposes only. If you use any of the material in this repository as part of your work, we ask you to cite:
@inproceedings{respec-CVPR-2025,
author = {Chris Dongjoo Kim and Jihwan Moon and Sangwoo Moon and Heeseung Yun and Sihaeng Lee and Aniruddha Kembhavi and Soonyoung Lee and Gunhee kim and Sangho Lee and Christopher Clark},
title = "{ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams}"
booktitle = {CVPR},
year = 2025
}