MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval [Paper]
by Seojeong Park1, Jiho Choi1, Kyungjune Baek2, Hyunjung Shim1
1 Korea Advanced Institute of Science and Technology (KAIST), 2 Sejong University
This repository provides the official implementation of MomentMix + Length-Aware Decoder (LAD) for improving short moment retrieval in Video Moment Retrieval tasks.
-
MomentMix – A two-stage temporal data augmentation:
- ForegroundMix – Splits long moments into shorter segments and shuffles them to enhance recognition of query-relevant frames.
- BackgroundMix – Preserves the foreground while replacing background regions with segments from other videos, improving discrimination between relevant and irrelevant frames.
-
Length-Aware Decoder (LAD) – Uses length-wise bipartite matching to pair predictions and ground truths within the same length category (short / middle / long), enabling length-specific decoder queries.
- Installation & Data Setup
- MomentMix: Data Augmentation
- Training with Length-Aware Decoder
- Inference
- Pre-trained Checkpoints
Before training or evaluation, please make sure both the datasets and the runtime environment are ready.
QVHighlights and the other benchmark datasets can be obtained by following the guidelines from QD-DETR.
git clone https://github.com/sjpark5800/LA-DETR
cd LA-DETR
conda create -n ladetr python=3.10
conda activate ladetr
pip install -r requirements.txtThe dataset required for MomentMix is available in the data/ directory.
You can either:
- Use the pre-generated dataset (recommended for reproducing our results), or
- Generate your own by following the detailed instructions in
momentmix/README.md.
For experimental consistency, we strongly recommend using the provided pre-generated dataset.
The following scripts train our models with the proposed MomentMix and Length-Aware Decoder.
# For LA-QD-DETR,
bash la_qd_detr/scripts/train.sh
# For LA-TR-DETR,
bash la_tr_detr/scripts/train.sh
# For LA-UVCOM,
bash la_uvcom/scripts/qv/train.sh
bash la_uvcom/scripts/tacos/train.sh
# SlowFast + CLIP
bash la_uvcom/scripts/cha/train.sh
# VGG + Glove
bash la_uvcom/scripts/cha_vgg/train.sh
# For LA-QD-DETR,
bash la_qd_detr/scripts/inference.sh {exp_dir}/model_best.ckpt 'val'
bash la_qd_detr/scripts/inference.sh {exp_dir}/model_best.ckpt 'test'
# For LA-TR-DETR,
bash la_tr_detr/scripts/inference.sh {exp_dir}/model_best.ckpt 'val'
bash la_tr_detr/scripts/inference.sh {exp_dir}/model_best.ckpt 'test'
# For LA-UVCOM,
bash la_uvcom/scripts/inference.sh {exp_dir}/model_best.ckpt 'val'
bash la_uvcom/scripts/inference.sh {exp_dir}/model_best.ckpt 'test'
{exp_dir} refers to the directory containing the trained model checkpoint and logs.
Note: For test results, please refer to Moment-DETR evaluation.
We release pre-trained checkpoints and training logs for all reported experiments to ensure reproducibility.
All model configurations are fully documented in the corresponding opt.json file.
📁 Download all checkpoints & logs here
| Dataset | Method | Model file |
|---|---|---|
| QVHighlights | QD-DETR + Ours | 🔗 checkpoint & log |
| QVHighlights | TR-DETR + Ours | 🔗 checkpoint & log |
| QVHighlights | UVCOM + Ours | 🔗 checkpoint & log |
| TACoS | UVCOM + Ours | 🔗 checkpoint & log |
| Charades | UVCOM + Ours | 🔗 checkpoint & log |
| Charades (VGG) | UVCOM + Ours | 🔗 checkpoint & log |
If you find this work useful, please cite:
@article{park2024length,
title={Length-Aware DETR for Robust Moment Retrieval},
author={Park, Seojeong and Choi, Jiho and Baek, Kyungjune and Shim, Hyunjung},
journal={arXiv preprint arXiv:2412.20816},
year={2024}
}
All code in this repository is released under the MIT License.
Parts of the annotation files and several implementation components are adapted from Moment-DETR, QD-DETR, TR-DETR, and UVCOM.