Goal is to create 100% permissive MIT/ApacheV2 LLM model that is useful for ChatGPT usecases.
Training code is based on Alpaca-LoRA, but all models will be fully open source.
No OpenAI-based Alpaca fine-tuning data will be left.
Final result will be committed to H2OGPT.
- Install python environment
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
# follow license agreement and add to bash if required
source ~/.bashrc
# For more control: Copy block it added to .bashrc, put into ~/.bashrc.conda, then source ~/.bashrc.conda
conda create -n h2ollm
conda activate h2ollm
conda install mamba -n base -c conda-forge
conda install python=3.10 -y
conda update -n base -c defaults conda- Install dependencies
pip install -r requirements.txt
-
Install full cuda toolkit, e.g. cuda 12.1 for Ubuntu 22.04 install cuda coolkit and CUDNN8 then reboot.
-
Ensure cuda in path:
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/" >> ~/.bashrc
echo "CUDA_HOME=/usr/local/cuda" >> ~/.bashrc
echo "export PATH=$PATH:/usr/local/cuda/bin/" >> ~/.bashrc
source ~/.bashrc # or source ~/.bashrc.conda
conda activate h2ollm- Compile bitsandbytes howto src
E.g. for CUDA 12.1 (for CUDA 11.7, use CUDA_VERSION=117 make cuda11x etc.)
pip uninstall bitsandbytes || true
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=121 make cuda12x
CUDA_VERSION=121 python setup.py install
cd ..Fine-tune on single GPU on single node:
torchrun finetune.py --base_model='EleutherAI/gpt-j-6B' --data_path=alpaca_data_cleaned.json
this will download the model, load the data, and generate an output directory lora-alpaca.
Fine-tune using 2 nodes with 2 GPUs each:
WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=0 --nproc_per_node=2 --master_port=1234 finetune.py --data_path=alpaca_data_cleaned.json --run_id=0 --base_model='EleutherAI/gpt-j-6B'
WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=1 --nproc_per_node=2 --master_port=1234 finetune.py --data_path=alpaca_data_cleaned.json --run_id=0 --base_model='EleutherAI/gpt-j-6B'
Fine-tune using 2 24GB GPUs to split up a 30B model:
WORLD_SIZE=2 python finetune.py --data_path=alpaca_data_cleaned.json --base_model="decapoda-research/llama-30b-hf" --ddp=False
Fine-tune previously saved model (running export_hf_checkpoint.py):
WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=0 --nproc_per_node=2 --master_port=1234 finetune.py --num_epochs=2 --micro_batch_size=8 --data_path=alpaca_data_cleaned.json --run_id=3 --base_model='gpt-j-6B.DAIdocs' --tokenizer_base_model='EleutherAI/gpt-j-6B' --output_dir=lora_6B.DAIdocs &> 3.node0.log
WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=1 --nproc_per_node=2 --master_port=1234 finetune.py --num_epochs=2 --micro_batch_size=8 --data_path=alpaca_data_cleaned.json --run_id=3 --base_model='gpt-j-6B.DAIdocs' --tokenizer_base_model='EleutherAI/gpt-j-6B' --output_dir=lora_6B.DAIdocs &> 3.node1.log
Generate on single GPU on single node:
torchrun generate.py --base_model='EleutherAI/gpt-j-6B' --lora_weights=lora-alpaca
this will download the foundation model, our fine-tuned lora_weights, and open up a GUI with text generation input/output.
In case you get peer to peer related errors, set this env var:
export NCCL_P2P_LEVEL=LOC
Open source instruct model for demoable usecases.
- Base: Start with fully open source apache 2.0 models EleutherAI--gpt-j-6B, EleutherAI--gpt-neox-20b, GPT-NeoXT-Chat-Base-20B, etc.
- Construct Prompt: Setup prompt engineering on 6B-20B as-is to convert a sentence into question/answer or command/response format
- Open-Source Instruct Data: Convert wiki data into instruct form
- Fine-tune: LORA fine-tune 6B and 20B using DAI docs
- Open Data & Model: Submit DAI docs model huggingface
- Use toolformer approach for external APIs
- Demonstrate fine-tuning working on some existing corpus
- Demonstrate efficiency of LORA for fast and low-memory fine-tuning
shawwn/llama
llama PRs
text-generation-webui
minimal-llama
finetune GPT-NeoX
GPTQ-for_LLaMa
OpenChatKit on multi-GPU
Non-Causal LLM
OpenChatKit_Offload