This repository contains an official implementation of Targeted Test Selection.
This work introduces a strategy that selects which tests are likely to fail after the implementation of targeted software changes. The approach proposes the construction and original preprocessing of numerous factors about tests, soft- ware changes and their co-occurrences. We incrementally propose increasingly advanced techniques for obtaining additional features based on code analysis and project structure to improve the quality of test selection. The obtained features are used to train a machine learning model that predicts the probability of a given test falling on a given code change.
To install project dependencies, execute
pip install -r requirements.txt
If you are going to use csaxgb or cba model, run
huggingface-cli login --token $YOUR_HF_TOKEN
For information about the token, visit User Access Token
-
Download IOF/ROL
feature-engineered.csv: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GIJ5DE and place it asbenchmarks/datasets/iofrol.csv -
Download GSDTSR
feature-engineered.csv: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/MJFKDN and place it asbenchmarks/datasets/gsdtsr.csv -
Follow the instruction in https://github.com/Amannor/redhat_final_proj/tree/main to create a psr.csv and place it as
benchmarks/datasets/redh.csv. We used the data saved in the Amannor's repository. -
Run
python benchmarks.py
-
Load the data according to the structure in the
data_structure/train_data_structure/. Readdata_structure/README.mdfor more information. When callingtrain.py, specify the path to this data. -
Run the following command in the root of the project. You can see all of the parameters by running
python train.py --help.
python train.py --data data --num_train_commits 5 --model xgb-
Load the data for inference according to the structure:
data_structure/predict_data_structure/. Readdata_structure/INFO.mdfor more information. When callingpredict.py, specify the path to this data. -
Run the following command in the root of the project. You can see all of the parameters by running
python predict.py --help.
python predict.py --input data_inference --model xgb --output output/output.json- The prediction results, sorted in descending order of failure probability, will be available in
output/output.json.
.
├── benchmarks.py - script for running benchmarks
├── train.py - script for training models
├── predict.py - script for inference
├── configs - configs for models
├── data_structure - data structure for training and inference
├── src
│ ├── benchmarks - source files for benchmarks
│ ├── models - source files for models
│ ├── preprocessing - source files for preprocessing data (cleaning data, creating dataset, etc.)
| └── metrics - utils for calculation metrics
└── predict.py - script for inference
Targeted Test Selection is available under the Apache License 2.0. See the LICENSE file for more info.