DICE: Detecting In-distribution Data Contamination with LLM's Internal State Data and Code for the paper.
git clone https://github.com/THU-KEG/DICE.git
cd DICEOur code to fine-tune contaminated model is stored in the OOD_test/scripts folder.
python scripts/rewrite.py --dataset_name gsm8kThe paraphrased dataset we used in the paper is available in the OOD_test/scripts/data folder.
-
You can fine-tune a contaminated model as follows. Change the base model by
--model_name. -
Change the contaminated benchmark by changing the
--train_dataset_nameand--dataset_name. -
The parameter
--epoch 1represents the 2% contamination setting in the paper. Omitting it represents the 10% setting.
cd OOD_test
CUDA_VISIBLE_DEVICES=0 python scripts/contaminated_finetune.py \
--model_name microsoft/phi-2 \
--generative_batch_size 32 \
--dataset_name gsm8k \
--train_dataset_name gsm8k \
--epochs 1You can also use the following script to directly reproduce the contaminated model of the main experiment in our paper.
CUDA_VISIBLE_DEVICES=0 bash scripts/contaminated_finetune.shSimilar to the fine-tuning process above, you can use the following scripts to test OOD performance.
The parameter settings are the same as above. The only thing to note is that --dataset_name is the OOD dataset to be tested, and --train_dataset_name is the contaminated dataset.
cd OOD_test
CUDA_VISIBLE_DEVICES=0 python OOD_generate_inf.py \
--model_name microsoft/phi-2 \
--generative_batch_size 32 \
--dataset_name math \
--train_dataset_name gsm8k \
--epochs 1Code of this part is stored in the Locate folder.
CUDA_VISIBLE_DEVICES=0 python DICE_locate.py \
--edited_model=meta-llama/Llama-2-7b-hf \
--hparams_dir=../hparams/DICE/llama-7b Code of this part is stored in the contamination_classifier folder.
make data (hidden states of contaminated layer)
You can use the following script to get the data.
-
You can fine-tune a contaminated model as follows. You can change the base model by
--model_name. -
Change the detect benchmark by
--test_dataset. -
--is_contaminatedshows whether the model is contaminated. -
--model_typeindicates whether the uncontaminated model is the vanilla model or the model fine-tuned only on orca. -
--contaminated_typeindicates whether the contaminated model is a fine-tuned version of the original benchmark (open) or a paraphrased benchmark (Evasive).
cd contamination_classifier
CUDA_VISIBLE_DEVICES=0 python data_maker.py \
--edited_model=meta-llama/Llama-2-7b-hf \
--hparams_dir=../hparams/DICE/llama-7b \
--test_dataset=GSM8K_seen \
--is_contaminated=True \
--model_type=vanilla \
--contaminated_type=openYou can also use the following script to directly reproduce test data of the main experiment in our paper.
CUDA_VISIBLE_DEVICES=0 bash scripts/make_test_data.shUse train_test.py to train and test a DICE.
You can simply use the following script to directly reproduce test results of the main experiment in our paper.
CUDA_VISIBLE_DEVICES=0 bash scripts/Test_DICE.shThe contamination_classifier folder contains the code for the main experiments in the paper, including the performance_vs_score subfolder that stores the code for the experiment to test the relationship between contaminated probability and model performance, draw_OOD.py is the code for drawing the detection distribution of the OOD dataset, and so on.
Our implementation is based on the repository of the paper "Evading Data Contamination Detection for Language Models is (too) Easy" by Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, and Martin Vechev. The original repository can be found here. Their LICENSE file can be found in the OOD_test folder as well. We have made some modifications to the code to adapt it to our needs.
We wish to express our appreciation to the pioneers in the field of evasive data contamination. Our work was developed as a way to address the attack presented in the evasive data contamination.