TAMP

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models, ACL 2025 Findings

Authors: Jaewoo Lee, Keyang Xuan, Chanakya Ekbote, Sandeep Polsetty, Yi R. (May) Fung, Paul Pu Liang
Paper

Introduction

Multimodal Large Language Models (MLLMs) have grown in size to address the complexities of multimodal tasks. While beneficial for performance, their colossal model size imposes substantial computational and memory resources, limiting their practicality in resource-constrained scenarios. Post-training model pruning effectively reduces model size by removing a massive number of parameters without compromising performance. However, most existing model compression techniques assume unimodal models, limiting their effectiveness in multimodal settings.

We propose TAMP (Token-Adaptive Multimodal Pruning), an effective MLLM pruning pipleine that leverages multimodal token attributes to meausre layer importance for layer-wise sparsity (DAS) and computes adaptive input activations for capturing multimodal processing demands at each layer (AMIA). Our method utilizes multimodal token attributes to guide MLLM pruning.

Install

Please follow the installation instructions from LLaVA-NeXT and VideoLLaMA2

Calibration Dataset

LLaVA-NeXT

Please download original the LLaVA-NeXT's visual instruction tuning dataset from LLaVA-NeXT-Data. Prepare this in TAMP/playground/LLaVA-NeXT-Data. Then, split the dataset by task names.

python llava/pruners/split_finetune_llava_next.py

VideoLLaMA2

Similarly, download the VideoLLaMA2's audiovisual instruction tuning dataset (AVInstruct) through the AVinstruct. Place the downloaded files in TAMP/datasets. Then, preprocess the AVInstruct annotations files by transforming them into LLaVA-like files.

python datasets/transform_to_avinstruct.py --video_dir TAMP/datasets/path_to_video --dataset_path1 TAMP/datasets/avqa_data1.json --dataset_path2 TAMP/datasets/avqa_data2.json --save_path TAMP/datasets/avinstruct_avqa_music.json

Models

In this paper, we focus on two models: llama3-llava-next-8b for vision-language model compression experiments and VideoLLaMA2.1-7B-AV for audio-visual-language model compression experiments. Please download the models in /checkpoints directory.

LLaVA-1.5 experiment

Pruning

bash scripts/prune/tamp.sh

Evaluation

Evaluate the pruned models with the official LLaVA-NeXT evaluation pipeline LLaVA-NeXT.

VideoLLaMA2 experiment

Pruning

bash scripts_videollama2/prune/tamp.sh

Evaluation

Evaluate the pruned models with the official LLaVA-NeXT evaluation pipeline VideoLLaMA2.

Bibtex

@inproceedings{lee2024tamp,
      title={TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models},
      author={Jaewoo Lee and Keyang Xuan and Chanakya Ekbote and Sandeep Polisetty and Yi R. (May) Fung and Paul Pu Liang},
      year={2025},
      booktitle={Findings of the Association for Computational Linguistics (ACL)},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
datasets		datasets
llava		llava
playground		playground
scripts		scripts
scripts_videollama2		scripts_videollama2
trl		trl
videollama2		videollama2
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TAMP

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models, ACL 2025 Findings

Introduction

Install

Calibration Dataset

LLaVA-NeXT

VideoLLaMA2

Models

LLaVA-1.5 experiment

Pruning

Evaluation

VideoLLaMA2 experiment

Pruning

Evaluation

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

G-JWLee/TAMP

Folders and files

Latest commit

History

Repository files navigation

TAMP

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models, ACL 2025 Findings

Introduction

Install

Calibration Dataset

LLaVA-NeXT

VideoLLaMA2

Models

LLaVA-1.5 experiment

Pruning

Evaluation

VideoLLaMA2 experiment

Pruning

Evaluation

Bibtex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages