🌟 [CVPR26] ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

Accepted at CVPR 2026
🌐 project page
📄 arXiv

Authors

Daichi Yashima^1,3 Shuhei Kurita^2,3 Yusuke Oda³ Komei Sugiura¹

¹Keio University ²NII ³NII LLMC

Installation

uv sync --extra train

# Or with pip
pip install -e ".[train]"

For the motion-vector extraction / visualization utilities under mviz/, add the mviz extra (covered by train as well):

pip install -e ".[mviz]"

Checkpoint

The pretrained ReMoRa checkpoint is available on Hugging Face:

🤗 naisekizero/ReMoRa

Inference

python infer_with_mv.py \
    --checkpoint checkpoints/ReMoRa-7B \
    --base lmms-lab/LLaVA-Video-7B-Qwen2 \
    --video /path/to/video.mp4 \
    --prompt "Describe what happens in this video."

Extracting motion vectors for training / batch eval

python scripts/extract_motion_vectors.py \
    --video-root /path/to/your/videos \
    --output-dir DATAS/motion_vectors \
    --fps 16 --block-size 16

# Training
bash scripts/train_remora.sh  # add --motion_vector_dir DATAS/motion_vectors
# or:
export REMORA_MV_DIR=DATAS/motion_vectors

# Batch evaluation
python llava/eval/infer.py \
    --motion_vector_dir DATAS/motion_vectors \
    ...

Acknowledgements

This codebase builds on:

License

This work is licensed under the BSD-3-Clause-Clear license. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
llava		llava
mviz		mviz
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
infer_with_mv.py		infer_with_mv.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌟 [CVPR26] ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

Authors

Installation

Checkpoint

Inference

Extracting motion vectors for training / batch eval

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🌟 [CVPR26] ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

Authors

Installation

Checkpoint

Inference

Extracting motion vectors for training / batch eval

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages