This is the official implementation for the paper Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging.
- Our paper won the $${\color{red}best}$$ $${\color{red}paper}$$ $${\color{red}award}$$ of CCS-LAMPS 2024!
- Our paper is accepted by CCS-LAMPS 2024!
- The LLMs used in our paper are LLaMA-2-7B-hf, LLaMA-2-7B-CHAT-hf, and WizardMath-7B-V1.0.
- Watermarked LLMs: We leverage Quantization Watermarking to embed normal watermaks into LLaMA-2-7B-CHAT.
- Fingerprinted LLMs: We leverage Instructional Fingerprint (SFT version) to protect LLaMA-2-7B-CHAT.
- We leverage mergekit to merge LLMs. You should download and install it first. The merging configurations used in our paper can be found in /merge_config. You can merge your LLMs as
mergekit-yaml merge_config/ties.yml [path_to_save_merged_model] --cuda
- We use StrongReject-small dataset to evaluate the safety alignmentwithin LLMs. You can runeval_safe.pyto get the refusal rate results.
python eval_safe.py --model llama2-7b-chat
- We use GSM8K dataset to evaluate the mathematical reasoning abilityof LLMs. You can runeval_math.pyto get the prediction accuracy results.
python eval_math.py --model llama2-7b-chat
If you find our work helpful, please cite it as follows, thanks!
@misc{cong2024mergeguardeval,
      title={Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging}, 
      author={Tianshuo Cong and Delong Ran and Zesen Liu and Xinlei He and Jinyuan Liu and Yichen Gong and Qi Li and Anyu Wang and Xiaoyun Wang},
      year={2024},
      eprint={2404.05188},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}