Codestin Search App

H2O LLM Prototyping Playground

Goal is to create 100% permissive MIT/ApacheV2 LLM model that is useful for ChatGPT usecases.

Training code is based on Alpaca-LoRA, but all models will be fully open source.

No OpenAI-based Alpaca fine-tuning data will be left.

Final result will be committed to H2OGPT.

Setup

Install python environment

wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
# follow license agreement and add to bash if required
source ~/.bashrc
# For more control: Copy block it added to .bashrc, put into ~/.bashrc.conda, then source ~/.bashrc.conda
conda create -n h2ollm
conda activate h2ollm
conda install mamba -n base -c conda-forge
conda install python=3.10 -y
conda update -n base -c defaults conda

Install dependencies

pip install -r requirements.txt

Install full cuda toolkit, e.g. cuda 12.1 for Ubuntu 22.04 install cuda coolkit and CUDNN8 then reboot.
Ensure cuda in path:

echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/" >> ~/.bashrc
echo "CUDA_HOME=/usr/local/cuda" >> ~/.bashrc
echo "export PATH=$PATH:/usr/local/cuda/bin/" >> ~/.bashrc
source ~/.bashrc  # or source ~/.bashrc.conda
conda activate h2ollm

Compile bitsandbytes howto src

E.g. for CUDA 12.1 (for CUDA 11.7, use CUDA_VERSION=117 make cuda11x etc.)

pip uninstall bitsandbytes || true
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=121 make cuda12x
CUDA_VERSION=121 python setup.py install
cd ..

Fine-tune on single GPU on single node:

torchrun finetune.py --base_model='EleutherAI/gpt-j-6B' --data_path=alpaca_data_cleaned.json

this will download the model, load the data, and generate an output directory lora-alpaca.

Fine-tune using 2 nodes with 2 GPUs each:

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=0 --nproc_per_node=2 --master_port=1234 finetune.py --data_path=alpaca_data_cleaned.json --run_id=0 --base_model='EleutherAI/gpt-j-6B'

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=1 --nproc_per_node=2 --master_port=1234 finetune.py --data_path=alpaca_data_cleaned.json --run_id=0 --base_model='EleutherAI/gpt-j-6B'

Fine-tune using 2 24GB GPUs to split up a 30B model:

WORLD_SIZE=2 python finetune.py --data_path=alpaca_data_cleaned.json --base_model="decapoda-research/llama-30b-hf" --ddp=False

Fine-tune previously saved model (running export_hf_checkpoint.py):

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=0 --nproc_per_node=2 --master_port=1234 finetune.py --num_epochs=2 --micro_batch_size=8 --data_path=alpaca_data_cleaned.json --run_id=3 --base_model='gpt-j-6B.DAIdocs' --tokenizer_base_model='EleutherAI/gpt-j-6B' --output_dir=lora_6B.DAIdocs &> 3.node0.log

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1" torchrun --nnodes=2 --master_addr="10.10.10.2" --node_rank=1 --nproc_per_node=2 --master_port=1234 finetune.py --num_epochs=2 --micro_batch_size=8 --data_path=alpaca_data_cleaned.json --run_id=3 --base_model='gpt-j-6B.DAIdocs' --tokenizer_base_model='EleutherAI/gpt-j-6B' --output_dir=lora_6B.DAIdocs &> 3.node1.log

Generate on single GPU on single node:

torchrun generate.py --base_model='EleutherAI/gpt-j-6B' --lora_weights=lora-alpaca

this will download the foundation model, our fine-tuned lora_weights, and open up a GUI with text generation input/output.

In case you get peer to peer related errors, set this env var:

export NCCL_P2P_LEVEL=LOC

Plan

Open source instruct model for demoable usecases.

Base: Start with fully open source apache 2.0 models EleutherAI--gpt-j-6B, EleutherAI--gpt-neox-20b, GPT-NeoXT-Chat-Base-20B, etc.
Construct Prompt: Setup prompt engineering on 6B-20B as-is to convert a sentence into question/answer or command/response format
Open-Source Instruct Data: Convert wiki data into instruct form
Fine-tune: LORA fine-tune 6B and 20B using DAI docs
Open Data & Model: Submit DAI docs model huggingface
Use toolformer approach for external APIs

Goals

Demonstrate fine-tuning working on some existing corpus
Demonstrate efficiency of LORA for fast and low-memory fine-tuning

Code to consider including

shawwn/llama
llama PRs
text-generation-webui
minimal-llama
finetune GPT-NeoX
GPTQ-for_LLaMa
OpenChatKit on multi-GPU
Non-Causal LLM
OpenChatKit_Offload

Help

FAQs

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
examples		examples
.gitignore		.gitignore
FAQ.md		FAQ.md
LINKS.md		LINKS.md
README.md		README.md
alpaca_data_cleaned.json		alpaca_data_cleaned.json
config.json		config.json
dai_docs.train.json		dai_docs.train.json
dai_docs.train_cleaned.json		dai_docs.train_cleaned.json
dai_docs.valid.json		dai_docs.valid.json
dai_faq.json		dai_faq.json
export_hf_checkpoint.py		export_hf_checkpoint.py
finetune.py		finetune.py
generate.py		generate.py
merged.json		merged.json
requirements.txt		requirements.txt
scrape_dai_docs.py		scrape_dai_docs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

H2O LLM Prototyping Playground

Setup

Plan

Goals

Code to consider including

Help

More links, context, competitors, models, datasets

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 57

Uh oh!

Languages

License

h2oai/h2ogpt

Folders and files

Latest commit

History

Repository files navigation

H2O LLM Prototyping Playground

Setup

Plan

Goals

Code to consider including

Help

More links, context, competitors, models, datasets

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 57

Uh oh!

Languages

Packages