Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ qlora Public
forked from artidoro/qlora

QLoRA: Efficient Finetuning of Quantized LLMs

License

Notifications You must be signed in to change notification settings

SubSir/qlora

 
 

Repository files navigation

Qlora

Setup

conda activate qlora
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Start

  1. Copy llama-7b-hf-transformers-4.29 to localssd.
  2. Run prepare_mmlu.py to download mmlu data.
  3. Run run_qlora.sh or run_gwqlora.sh or run_lora.sh. Finetuning a llama 7b model costs about 5 hours on a A100 but the evaluation costs a lot of time too. The total running time should be within 8 hours.
  4. A int benchmark can be generated by modifying the code in "class TweakEvery100Steps" and then using tweak_once.sh to run it.

Results

rank=4 llama 7B 0-shot 5-shot
group=64 MMLU STEM Hums Social Other Avg STEM Hums Social Other Avg
origin 27.3 33.0 32.4 37.3 32.6 30.6 34.1 38.2 38.2 35.2
lora 31.6 36.9 38.9 42.1 37.4
int3 28.8 32.6 31.8 35.2 32.2 30.1 33.2 37.4 38.2 34.6
int3 tweakonce 29.4 32.0 33.4 36.2 32.7 30.2 33.2 37.4 38.6 34.7
gwq 29.8 33.3 34.2 37.7 33.7 31.3 33.8 38.3 39.7 35.6
int3-g128 28.4 31.8 31.3 35.3 31.8 28.3 30.5 31.5 33.3 30.9
l4q(3bit-g128) 27.8 29.5 32.1 33.3 30.6 31.0 29.3 33.5 30.4 31.8

The result of l4q is taken from its original paper. I have no idea why it's worse than original int3-g128.

There are some experiments to be done, so some part of the chart remains blank.

About

QLoRA: Efficient Finetuning of Quantized LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 80.2%
  • Python 16.5%
  • Shell 2.4%
  • HTML 0.9%