Thanks to visit codestin.com
Credit goes to github.com

Skip to content

h2oai/h2ogpt

 
 

Repository files navigation

H2O LLM Prototyping Playground

Goal is to create 100% permissive MIT/ApacheV2 LLM model that is useful for ChatGPT usecases.

Training code is based on Alpaca-LoRA, but all models will be fully open source.

No OpenAI-based Alpaca fine-tuning data will be left.

Final result will be committed to H2OGPT.

Setup

  1. Install python environment
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
# follow license agreement and add to bash if required
source ~/.bashrc
# For more control: Copy block it added to .bashrc, put into ~/.bashrc.conda, then source ~/.bashrc.conda
conda create -n h2ollm
conda activate h2ollm
conda install mamba -n base -c conda-forge
conda install python=3.10 -y
conda update -n base -c defaults conda
  1. Install dependencies
pip install -r requirements.txt
  1. Install full cuda toolkit, e.g. cuda 12.1 for Ubuntu 22.04 install cuda coolkit and CUDNN8 then reboot.

  2. Ensure cuda in path:

echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/" >> ~/.bashrc
echo "CUDA_HOME=/usr/local/cuda" >> ~/.bashrc
echo "export PATH=$PATH:/usr/local/cuda/bin/" >> ~/.bashrc
source ~/.bashrc  # or source ~/.bashrc.conda
conda activate h2ollm
  1. Compile bitsandbytes howto src

E.g. for CUDA 12.1 (for CUDA 11.7, use CUDA_VERSION=117 make cuda11x etc.)

pip uninstall bitsandbytes || true
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=121 make cuda12x
CUDA_VERSION=121 python setup.py install
cd ..

Fine-tune on single GPU on single node:

torchrun finetune.py --base_model='EleutherAI/gpt-j-6B' --data_path=alpaca_data_cleaned.json 

this will download the model, load the data, and generate an output directory lora-alpaca.

Fine-tune using 2 nodes with 2 GPUs each:

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=0 --nproc_per_node=2 --master_port=1234 finetune.py --data_path=alpaca_data_cleaned.json --run_id=0 --base_model='EleutherAI/gpt-j-6B'

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=1 --nproc_per_node=2 --master_port=1234 finetune.py --data_path=alpaca_data_cleaned.json --run_id=0 --base_model='EleutherAI/gpt-j-6B'

Fine-tune using 2 24GB GPUs to split up a 30B model:

WORLD_SIZE=2 python finetune.py --data_path=alpaca_data_cleaned.json --base_model="decapoda-research/llama-30b-hf" --ddp=False

Fine-tune previously saved model (running export_hf_checkpoint.py):

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=0 --nproc_per_node=2 --master_port=1234 finetune.py --num_epochs=2 --micro_batch_size=8 --data_path=alpaca_data_cleaned.json --run_id=3 --base_model='gpt-j-6B.DAIdocs' --tokenizer_base_model='EleutherAI/gpt-j-6B' --output_dir=lora_6B.DAIdocs &> 3.node0.log

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=1 --nproc_per_node=2 --master_port=1234 finetune.py --num_epochs=2 --micro_batch_size=8 --data_path=alpaca_data_cleaned.json --run_id=3 --base_model='gpt-j-6B.DAIdocs' --tokenizer_base_model='EleutherAI/gpt-j-6B' --output_dir=lora_6B.DAIdocs &> 3.node1.log

Generate on single GPU on single node:

torchrun generate.py --base_model='EleutherAI/gpt-j-6B' --lora_weights=lora-alpaca

this will download the foundation model, our fine-tuned lora_weights, and open up a GUI with text generation input/output.

In case you get peer to peer related errors, set this env var:

export NCCL_P2P_LEVEL=LOC

Plan

Open source instruct model for demoable usecases.

  1. Base: Start with fully open source apache 2.0 models EleutherAI--gpt-j-6B, EleutherAI--gpt-neox-20b, GPT-NeoXT-Chat-Base-20B, etc.
  2. Construct Prompt: Setup prompt engineering on 6B-20B as-is to convert a sentence into question/answer or command/response format
  3. Open-Source Instruct Data: Convert wiki data into instruct form
  4. Fine-tune: LORA fine-tune 6B and 20B using DAI docs
  5. Open Data & Model: Submit DAI docs model huggingface
  6. Use toolformer approach for external APIs

Goals

  1. Demonstrate fine-tuning working on some existing corpus
  2. Demonstrate efficiency of LORA for fast and low-memory fine-tuning

Code to consider including

shawwn/llama
llama PRs
text-generation-webui
minimal-llama
finetune GPT-NeoX
GPTQ-for_LLaMa
OpenChatKit on multi-GPU
Non-Causal LLM
OpenChatKit_Offload

Help

FAQs

More links, context, competitors, models, datasets

Links