On Pruning State-Space LLMs [arXiv]
Tamer Ghattas, Michael Hassid and Roy Schwartz
Hebrew University of Jerusalem
This repo include the adaptation of WANDA and FLAP pruning methods to Mamba2 models along with the headdim and dstate pruning methods explained in the paper. The code is based on the original repos, you'll find pruning methods implementations in the Mamba layer in each one of discrete_mamba2.py , mixer_seq_simple.py and hybrid_mamba_layer.py and modified versions of WANDA and FLAP.
conda create -n ssm-pruner python=3.10conda activate ssm-prunerpip install torch==2.4.0+cu121 --extra-index-url https://download.pytorch.org/whl/cu121 --no-cache-dirpip install datasets==3.0.0pip install transformers==4.48.1pip install triton mamba-ssm==2.2.2 flash-attn==2.6.3: the core Mamba package.
wanda/scripts/mamba.shFLAP/scripts/mamba.shpython prune_mha.pyThis model was distilled from SmolLM2-1.7B using our implementation of MOHAWK in train.py.
For fine-tuning our pruned models with distillation loss we used finetune.py.
@misc{ghattas2025pruningstatespacellms,
title={On Pruning State-Space LLMs},
author={Tamer Ghattas and Michael Hassid and Roy Schwartz},
year={2025},
eprint={2502.18886},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.18886},
}