This is an official PyTorch Implementation of Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection.
This project runs on Linux (Ubuntu 22.04) with one GPU (~4G) and a large memory (~80G).
Install the following packages at first:
- python 3.9.2
- PyTorch 1.10.0
- torchvision 0.11.1
- torchmetrics 0.9.3
- pandas
- munch
- h5py
- vit_pytorch
- omegaconf
Commnads for preparing datasets can be found in preprocess.sh.
For perform training process faster, we save visual features in .pkl files. For example, the structure of ImageNet_shot.pkl is as the following:
{"tt0000000":
{
"0000":array(),
"0001":array(),
···
},
"tt0000001":
{
···
},
...
}There in, tt0000000 is a video's ID. For each video, the key 0000 indicates a shot's ID. Each shot is encoded as a feature vector of 2048-dim.
For convinient, labels are reformated and saved into .pkl files, too.
shot_annotation.pklsaves the indices of the first frame and the last frame for each shot.scene_annotation.pklsaves the indices of the first shot and the last shot for each scene.label_dict.pklsaves a list for each video, where each element in the list indicates whether a shot is the first shot of a scene or not.
Codes for generating the above files can be found in preprocess.ipynb.
Commands for train&test can be seen in runner.sh. Some ablations can be seen in runner2.sh and runner3.sh. Here we show some basic commands.
Pre-training:
python -m src.pretrain config/selfsup_best.yamlFine-tuning:
python -m src.finetune config/selfsup_best.yamlTest:
python -m src.evaluate config/selfsup_best.yaml- Codes for plotting data points can be seen in
show_log.ipynb. - Codes for drawing heatmaps can be seen in
visualize.ipynb.
Configuration files config/xxx.yaml contains all the hyperparameters.
basecontains basic configuration. Among them,base.params.clip_lenindicates the number in each clip.base.pathincludes file paths of formatted datasets and labels.modelis the basename of the Model code file.
pretrain,finetuneandevaluatecorrespond to two training stages and the testing stage.pretrain.params.label_percentagespecifies the percentage of data to use during pre-training.finetune.aim_indexspecifies the index of films in OVSD/BBC for evaluation using leave-one-out method.finetune.load_pathspecifies the path of the pre-trained model.finetune.trainis the basename of the Dataset code file.finetune.vid_listindicates which subset of MovieNet to use.evaluate.headspecifies the prediction header to use.
@article{tan2024temporal,
title={Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection},
author={Tan, Jiawei and Yang, Pingan and Chen, Lu and Wang, Hongxing},
journal={ACM Transactions on Multimedia Computing, Communications and Applications},
year={2024},
publisher={ACM New York, NY}
}