BIMBA Original link

 @article{islam2025bimba,
   title={BIMBA: Selective-Scan Compression for Long-Range Video Question Answering},
   author={Islam, Md Mohaiminul and Nagarajan, Tushar and Wang, Huiyu and Bertasius, Gedas and Torresani, Lorenzo},
   journal={arXiv preprint arXiv:2503.09590},
   year={2025}
 }

🌐 Homepage | 📖 arXiv | 💻 GitHub | 🤗 Model | 🌟 Demo

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Gedas Bertasius, and Lorenzo Torresani
Accepted by CVPR 2025

BIMBA Overview

BIMBA is a multimodal large language model (MLLM) capable of efficiently processing long-range videos. Our model leverages the selective scan mechanism of Mamba to effectively select critical information from high-dimensional video and transform it into a reduced token sequence for efficient LLM processing. Extensive experiments demonstrate that BIMBA achieves state-of-the-art accuracy on multiple long-form VQA benchmarks, including PerceptionTest, NExT-QA, EgoSchema, VNBench, LongVideoBench, Video-MME, and MLVU.

Installation 🔧

Please use the following commands to install the required packages:

conda create --name bimba python=3.10
conda activate bimba
pip install --upgrade pip # PEP 660
pip install e .

conda create --name bimba python=3.10
conda activate bimba
pip install --upgrade pip # PEP 660
pip install e ".[train]"

This codebase is built on LLaVA-NeXT and mamba codebases.

Demo Selective-Scan Compression

We provide a demo notebook on how to use selective-scan/mamba-based token compression method for long-range videos introduced in our paper. Following this notebook, you can easily utilize this compression technique to reduce the input video tokens of your model.

Download Model

Download the model from HuggingFace 🤗

cd checkpoints
git clone https://huggingface.co/mmiemon/BIMBA-LLaVA-Qwen2-7B

Model Inference

Use the following script to make inference on any video.

python inference.py

Model Training

Follow the LLaVA-NeXT codebase to prepare the training data (e.g., LLaVA-Video-178K). Update the exp.yaml file to point to your data.
Follow the commands below to train BIMBA model:

bash scripts/video/train/Train_BIMBA_LLaVA_Qwen2_7B.sh

Evaluation

Evaluate Single Dataset (e.g., VideoMME)

First, download the videos from the huggingface/dataset repo and replace "path_to_video_folder" accordingly.
We provide the formatted json files for the evaluation datasets in the BIMBA-LLaVA-NeXT/DATAS/eval folder. You can format a new dataset using the script.

python llava/eval/format_eval_data.py

Use the following script to evaluate a particular dataset.

model_path = "checkpoints/BIMBA-LLaVA-Qwen2-7B"
model_base = "lmms-lab/LLaVA-Video-7B-Qwen2"
model_name = "llava_qwen_lora"
results_dir=results/BIMBA-LLaVA-Qwen2-7B
dataset_name=VideoMME
python llava/eval/infer.py \
    --model_path $model_path \
    --model_base $model_base \
    --model_name $model_name \
    --results_dir ${results_dir}/${dataset_name} \
    --max_frames_num 64 \
    --dataset_name $dataset_name \
    --video_root "path_to_video_folder" \
    --data_path DATAS/eval/VideoMME/formatted_dataset.json \
    --cals_acc

Evaluate All Benchmarks

Use the following script to evaluate PerceptionTest, NExT-QA, EgoSchema, VNBench, LongVideoBench, Video-MME, and MLVU benchmarks.

bash scripts/video/eval/Eval_BIMBA_LLaVA_Qwen2_7B.sh

For EgoSchema, use the following script to prepare the submission file.

python llava/eval/submit_ego_schema.py

Then, you can either submit directly to the kaggle competition page or use the script for submission and evaluation.

kaggle competitions submit -c egoschema-public -f results/BIMBA-LLaVA-Qwen2-7B/EgoSchema/es_submission.csv -m "BIMBA-LLaVA-Qwen2-7B"

You should get the following results on these benchmarks.

Dataset	EgoSchema	VNBench	VideoMME	MLVU	LongVideoBench	NextQA	PerceptionTest
Results	71.14	77.88	64.67	71.37	59.46	83.73	68.51

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
DATAS/eval		DATAS/eval
assets		assets
llava.egg-info		llava.egg-info
llava		llava
scripts		scripts
trl		trl
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
demo_selective_scan_compression.ipynb		demo_selective_scan_compression.ipynb
demo_selective_scan_compression.py		demo_selective_scan_compression.py
inference.py		inference.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
show_keyframe.py		show_keyframe.py
test_iframe.py		test_iframe.py
test_topk.py		test_topk.py
tutorial.md		tutorial.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BIMBA Original link

BIMBA Overview

Installation 🔧

Demo Selective-Scan Compression

Download Model

Model Inference

Model Training

Evaluation

Evaluate Single Dataset (e.g., VideoMME)

Evaluate All Benchmarks

About

Uh oh!

Releases

Packages

Languages

License

wooseungw/BIMBA

Folders and files

Latest commit

History

Repository files navigation

BIMBA Original link

BIMBA Overview

Installation 🔧

Demo Selective-Scan Compression

Download Model

Model Inference

Model Training

Evaluation

Evaluate Single Dataset (e.g., VideoMME)

Evaluate All Benchmarks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages