Towards Robustness of Deep Program Processing Models -- Detection, Estimation and Enhancement
See the paper here.
Watch the video here.
torch == 1.8.0
dgl == 0.7.2
transformers == 3.3
This is the recommended environment. Other versions may be compatible.
Use the pre-processed datasets
-
Download the already pre-processed datasets -- OJ, OJClone, and CodeChef.
-
Put the contents of OJ, OJClone and CodeChef in the corresponding directories of
data,data_clone, anddata_defectrespectively. -
The pre-processed datasets for GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT and CDLH are all included in the directories now.
Pre-process the datasets by yourself
-
Download the raw datasets, i.e.,
oj.tar.gzfrom OJ andcodechef.zipfrom CodeChef -
Put
oj.tar.gzin the directory ofdataandcodechef.zipindata_defect. -
Run the following commands to build the OJ dataset for the DL models. The dataset format of CodeChef is almost identical to OJ, and the code can be reused.
> cd preprocess-lstm
> python3 main.py
> cd ../preprocess-astnn
> python3 pipeline.py
> cd ../preprocess-tbcnn
> python3 main.py
> cd ..-
Copy
oj.pkl.gz,oj_uid.pkl.gz, andoj_inspos.pkl.gzin the directory ofdataand paste them intodata_clone. -
Run the following commands to build the OJClone dataset for the DL models.
> cd preprecess_clone-lstm
> python3 main.py
> cd ../preprocess_clone-astnn
> python3 main.py
> cd ../preprocess_clone-tbcnn
> python3 main.py
> cd ..- Everything is ready now.
The source code directories are named according to the dataset and the model. code, code_clone and code_defect refers to OJ, OJClone and CodeChef, respectively.
The source code files to train each model (i.e., GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT and CDLH) on each dataset (i.e., OJ, OJClone, and CodeChef) are included in each corresponding directory. For instance, code_defect-codebert refers to CodeBERT for CodeChef. Note that the GRU and LSTM models are both in the directory of lstm.
E.g., run the following commands to train a GRU model on OJ.
> cd code-lstm
> python3 lstm_train.py -gpu 0 -model GRU -lr 1e-3 -save_dir MODEL/SAVE/PATH --data ../data/oj.pkl.gz
> cd ..Run the following commands to train a LSTM model on CodeChef.
> cd code_defect-lstm
> python3 lstm_train.py -gpu 0 -model LSTM -lr 1e-3 -save_dir MODEL/SAVE/PATH --data ../data_defect/oj.pkl.gz
> cd ..Run the following commands to train a CDLH model on OJClone.
> cd code_clone-cdlh
> python3 train.py --save_dir MODEL/SAVE/PATH
> cd ..Run python3 attacker.py in each directory to attack the DL models.
E.g., run the following commands to attack the CodeBERT model on OJ.
> cd code-codebert
> python3 attacker.py --model_dir FINETUNED/CODEBERT/MODEL/PATH
> cd ..The corresponding relationship between the attacking algorithm and the Attacker class is as the following table.
| Attacking Algorithm | Class Name |
|---|---|
| I-CARROT | Attacker |
| S-CARROT | InsAttacker |
| I-RW | AttackerRandom |
| S-RW | InsAttackerRandom |
One may use different attacking algorithm (including I-CARROT, S-CARROT, I-RW, and S-RW) by employing different Attacker's in the code.
After adversarial attack, the logging files are obtained. Run the following commands to compute the robustness of the DL model.
> python3 compute_robustness.py -I PATH/TO/ICARROT/LOG -S PATH/TO/SCARROT/LOGTake LSTM on OJClone for example.
- Run the following commands to create the adversarial example training set.
> cd code_clone-lstm
> python3 attacker4training.py
> cd ..- Run the following commands to adversarially train the model.
> cd code_clone-lstm
> python3 lstm_train.py --adv_train_path PATH/TO/ADVERSARIAL/EXAMPLE/SET --OTHER_ARGUMENTS
> ..- Go back to step 1 to iteratively update the adversarial example set upon the current training set.
@article{zhang2022towards,
title={Towards Robustness of Deep Program Processing Models--Detection, Estimation and Enhancement},
author={Zhang, Huangzhao and Fu, Zhiyi and Li, Ge and Ma, Lei and Zhao, Zhehao and Yang, Hua’an and Sun, Yizhe and Liu, Yang and Jin, Zhi},
journal={ACM Transactions on Software Engineering and Methodology},
year={2022},
publisher={ACM New York, NY}
}