Thanks to visit codestin.com
Credit goes to github.com

Skip to content

keio-smilab26/ReMoRa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌟 [CVPR26] ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

Conference arXiv

Authors

Daichi Yashima1,3    Shuhei Kurita2,3    Yusuke Oda3    Komei Sugiura1

1Keio University    2NII    3NII LLMC

Installation

uv sync --extra train

# Or with pip
pip install -e ".[train]"

For the motion-vector extraction / visualization utilities under mviz/, add the mviz extra (covered by train as well):

pip install -e ".[mviz]"

Checkpoint

The pretrained ReMoRa checkpoint is available on Hugging Face:

Inference

python infer_with_mv.py \
    --checkpoint checkpoints/ReMoRa-7B \
    --base lmms-lab/LLaVA-Video-7B-Qwen2 \
    --video /path/to/video.mp4 \
    --prompt "Describe what happens in this video."

Extracting motion vectors for training / batch eval

python scripts/extract_motion_vectors.py \
    --video-root /path/to/your/videos \
    --output-dir DATAS/motion_vectors \
    --fps 16 --block-size 16
# Training
bash scripts/train_remora.sh  # add --motion_vector_dir DATAS/motion_vectors
# or:
export REMORA_MV_DIR=DATAS/motion_vectors

# Batch evaluation
python llava/eval/infer.py \
    --motion_vector_dir DATAS/motion_vectors \
    ...

Acknowledgements

This codebase builds on:

License

This work is licensed under the BSD-3-Clause-Clear license. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors