Code for the safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates" (https://arxiv.org/abs/2402.18540)
Go to the folder gpt-api and see run-gpt-gsm.sh for an example shell script to fine-tune gpt-3.5-turbo-0613 on GSM8K.
- The code will automatically output the ids of the fine-tuning job and the fine-tuned model, and log them to WandB.
- You can also view the training curves when the training ends on WandB.
- See gpt-api/prompt_utils.py for all prompt templates.
Coming soon!
The code for llama-2 finetuning is under the llama2 folder. To finetune on the ChatDoctor dataset, please
- Download the dataset here
- Move the json file to
llama2/medical_dataset/. - Fill in your wandb project name and user name in lines 57 and 58 in
finetuning.py. - Run
train-chatdoctor-lora.shunder thellama2folder.
inference.py is a variant of Llama's inference code but with multi-gpu support.
python inference.py \
<path-to-model>
--peft_model <path-to-peft> \
--prompt_file vfleaking/DirectHarm4 \
--prompt_template_style gsm:chat:llama \
--output <output-file> \
--top_p 0 --freq 8prompt_file: can bevfleaking/DirectHarm4,https://huggingface.co/datasets/vfleaking/GSM-Dangerordata/advbench-harmful-behaviors.csvprompt_template_style: Seeprompt_utils.pyfor possible options.freq: the batch size
gpt4_eval.py is a multi-thread variant of gpt4_eval.py from Qi et al. (2023). Please set your OpenAI API key before running the evaluation command:
python safety_evaluation/gpt4_eval.py --input_file question_output/example.jsonlinput_file: ajsonlfile with each line containing the input prompt and the model response.- The output of the GPT-4 judge will be saved under
safety_evaluation/gpt4_eval_output.
@article{lyu2024keeping,
title={Keeping {LLMs} Aligned After Fine-tuning: The Crucial Role of Prompt Templates},
author={Kaifeng Lyu and Haoyu Zhao and Xinran Gu and Dingli Yu and Anirudh Goyal and Sanjeev Arora},
journal={arXiv preprint arXiv:2402.18540},
year={2024}
}