Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[IJCNN 2025] This is the repo for paper "TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs" (https://arxiv.org/abs/2501.15674)

License

Notifications You must be signed in to change notification settings

guyuxuan9/TensorLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorLLM

This repository contains the implementation of TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs.

Overview : The reasoning abilities of Large Language Models (LLMs) can be improved by structurally denoising their weights, yet existing techniques primarily focus on denoising the feed-forward network (FFN) of the transformer block, and can not efficiently utilise the Multi-head Attention (MHA) block, which is the core of transformer architectures. To address this issue, we propose a novel intuitive framework that, at its very core, performs MHA compression through a multi-head tensorisation process and the Tucker decomposition. This enables both higher-dimensional structured denoising and compression of the MHA weights, by enforcing a shared higher-dimensional subspace across the weights of the multiple attention heads. We demonstrate that this approach consistently enhances the reasoning capabilities of LLMs across multiple benchmark datasets, and for both encoder-only and decoder-only architectures, while achieving compression rates of up to ∼250 times in the MHA weights, all without requiring any additional data, training, or fine-tuning. Furthermore, we show that the proposed method can be seamlessly combined with existing FFN-only-based denoising techniques to achieve further improvements in LLM reasoning performance. image

Setting Up the Environment

To avoid version conflicts, it is recommended to use a separate Conda environment. Follow these steps to set up:

  1. Modify the create_env.sh script:
    • Update the Conda path on line 19:
      eval "$(~/miniforge3/bin/conda shell.bash hook)"
      Replace ~/miniforge3/bin/conda with the correct path to your Conda installation.
  2. Run the installation script:
    chmod +x create_script.sh
    ./create_script.sh
    The setup takes approximately 3 minutes.
  3. Initialize Conda and activate the environment:
    # Replace '~/miniforge3/bin/conda' with the path to your conda nstallation
    eval "$(~/miniforge3/bin/conda shell.bash hook)"
    conda activate TensorLLM

Experiment Modes

You can run experiments in the following modes:

  • 4D_Tucker (our method only): Tucker decomposition with shared factor matrices (applied to MHA block).
  • 4D_Tucker_laser: (our method for MHA) + (LASER for FFN).
  • laser: Original LASER intervention.
  • 3D_Tucker: Separately compresses $\mathbf{W}_Q$, $\mathbf{W}_K$, $\mathbf{W}_V$, and $\mathbf{W}_O$ (for ablation studies).

Running Experiments

Below are example commands to reproduce the results from the paper. The following bash commands uses GPT-J model as an example.

4D_Tucker Mode

python3 src/TensorLLM_intervention_gptj_bbh_qa.py --mode 4D_Tucker --lnum 27 --qkvo_rank 304 --head_dim_rank 19 --stack_rank 2 --single_experiment --device cuda

python3 src/TensorLLM_intervention_gptj_bios_profession.py --mode 4D_Tucker --lnum 18 --qkvo_rank 208 --head_dim_rank 13 --stack_rank 1 --single_experiment --device cuda

python3 src/TensorLLM_intervention_gptj_fever.py --mode 4D_Tucker --lnum 11 --qkvo_rank 800 --head_dim_rank 50 --stack_rank 2 --single_experiment --device cuda

python3 src/TensorLLM_intervention_gptj_hotpot.py --mode 4D_Tucker --lnum 27 --qkvo_rank 64 --head_dim_rank 4 --stack_rank 2 --single_experiment --device cuda

Parameters

  • lnum: Specifies the layer number.
  • Ranks: (qkvo_rank, head_dim_rank, stack_rank): Controls decomposition granularity.

Acknowledgement

We gratefully acknowledge the use of code from the following projects: LASER

Citation

If you find our paper or code useful, we will greatly appreacite it if you could consider citing our paper:

@article{gu2025tensorllm,
  title={TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs},
  author={Gu, Yuxuan and Zhou, Wuyang and Iacovides, Giorgos and Mandic, Danilo},
  journal={arXiv preprint arXiv:2501.15674},
  year={2025}
}

About

[IJCNN 2025] This is the repo for paper "TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs" (https://arxiv.org/abs/2501.15674)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published