Training-Free Acivation Sparsity in Large Language Models

This is a fork that seeks to retrieve the sparsified input tensors and matrices from TEAL by simulating it in PyTorch. For the actual implementation, check out the main TEAL repository.

More Resources: [Paper][Blog]

Install

Clone the repo and navigate to TEAL:

git clone https://github.com/dat-adi/TEAL
cd TEAL

Set up environment:

conda create -yn teal python=3.11
conda activate teal

pip install -e .

Create a huggingface account and export a key to be able to download models. Note that you will need to accept the terms and conditions document on the Huggingface Llama page:

export HF_TOKEN=...

Execute

Navigate to gpt-fast and export the save path which is where you'd have stored the models:

cd gpt-fast
export SAVE_PATH=/home/ec2-user/TEAL/gpt-fast/models/

Download model weights and convert to gpt-fast format (scripts/prepare.sh):

python scripts/download.py --repo_id meta-llama/Llama-2-7b-hf --path $SAVE_PATH && python scripts/convert_hf_checkpoint.py --checkpoint_dir $SAVE_PATH/meta-llama/Llama-2-7b-hf

Run sparse inference using the following command:

CUDA_VISIBLE_DEVICES=0 python generate.py \
    --checkpoint_path $SAVE_PATH/meta-llama/Llama-2-7b-hf/model.pth \ 
    --hist_path ../models/Llama-2-7B/histograms \ 
    --sparsity 0.5 \ 
    --max_new_tokens 40

Provided that the proxy function simulate_splitk is active and being used, this should end up dumping the sparsified input tensors onto your system at different stages of a layer, across 32 layers for 40 inferences.

Modifying the sparsity value in the above command will increase/decrease sparsity, while modifying the max new tokens increases/decreases the tokens generated and thus the number of matrices dumped on disk as well.

Citation

If you find TEAL useful, please consider citing:

@misc{liu2024trainingfreeactivationsparsitylarge,
      title={Training-Free Activation Sparsity in Large Language Models}, 
      author={James Liu and Pragaash Ponnusamy and Tianle Cai and Han Guo and Yoon Kim and Ben Athiwaratkun},
      year={2024},
      eprint={2408.14690},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.14690}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
figures		figures
gpt-fast		gpt-fast
kernels		kernels
models		models
notebooks		notebooks
scripts		scripts
teal		teal
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training-Free Acivation Sparsity in Large Language Models

Install

Execute

Citation

About

Uh oh!

Releases

Packages

Languages

License

dat-adi/TEAL

Folders and files

Latest commit

History

Repository files navigation

Training-Free Acivation Sparsity in Large Language Models

Install

Execute

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages