DeepFL is a deep learning approach to automatically learn the most effective existing/latent features for precise fault localization. It exploits multi-dimensional features, i.e., spectrum-based, mutation-based, complexity-based (code metrics) and textual-similarity-based information and implements two multi-layer perceptron variants (MLP and MLP2), one recurrent neural networks variant (BiRNN) and two tailored MLP variants by Tensorflow. For evaluating its performance, DeepFl adopts the benchmark subjects from Defects4j, an open source repository which provides buggy versions and the corresponding fixed versions of multiple projects.
To simplify the testing process of DeepFL, we provide a docker container which can be downloaded from an online Cloud Drive. Please note that the size of our docker container including the dataset is larger than 30GB, so please prepare sufficient disk space. You can install the docker files by following commands below. Specifically, you need to unzip and load the docker, run its generated image, stash some possible conflicts and pull the new updates.
$unzip -o deepfl.zip
$docker import deepfl.tar tmp/deepfl
$docker run -t -i tmp/deepfl bash
$cd DeepFaultLocalization
$git stash
$git pull
If you are not familiar with docker or downloading the docker image is time-consuming in your network, you can also run DeepFL on your local machine by following the instructions of our Github page.
You can run DeepFL by the following command:
$cd /DeepFaultLocalization/RunDeepFL/
$python main.py .. ../Results $subject $version $model $scenario $loss $epoch $dump_step
whrere each parameter can be explained as follows:
..: the path of the parent directory including all the datasets.../Results: the directory to cache the results. Please kindly note that this argument cannot be changed toCachedResults, otherwise our prepared results would be overridden.$subject: the subject name, i.e. Time, Chart, Lang, Math, Mockito or Closure.$version: the version number of the corresponding subject. Please kindly note that the maximum numbers of the subjects above are 27, 26, 65, 106, 38, and 133, respectively.$model: the implemented model name, i.e., mlp, mlp2, rnn, birnn, dfl1, dfl2, dfl1-Metrics, dfl1-Mutation, dfl1-Spectrum, dfl1-Textual representing multi-layer perceptron with one hidden layer, multi-layer perceptron with two hidden layers, recurrent neural network, bidirectional recurrent neural network, tailored MLP1, tailored MLP2, and tailored MLP1 without the information of metrics/mutation/spectrum/textual-similarity, respectively.$scenario: the three different scenarios evaluated in our paper, i.e., the arguments can be set as DeepFL, CrossDeepFL and CrossValidation representing the within-project, cross-project and cross validation scenarios respectively. Specially, both CrossDeepFL and CrossValidation can only use softmax as loss function (explained in the paper).$loss: the loss function name, i.e., softmax and epairwise.$epoch: the number of training epochs.$dump_step: the interval number of epochs specifying when the results are dumped into the result file. For example, if $dump_step = 10, the results for epochs 10, 20, 30... are written into the output files.
A sample command to train the dfl1 model using softmax as loss function for version 1 of subject Time in within-project scenario is listed as follows.
$python main.py .. ../Results Time 1 dfl1 DeepFL softmax 2 1
Please note that CrossValidation is slightly different from the other two scenarios since the dataset of all the subjects has been mixed and then splitted into 10-fold. To efficiently use the command above, just set the parameter $subject as "10fold", $version as 1 to 10, and $scenario as CrossValidation. Also, please only use dfl_2 model and softmax loss function to run on CrossValidation according to our paper.
We also provide a script to finish all training process evaluated in our paper. Typically, the directory to cache the results is /DeepFaultLocalization/Results and all result tables and figures in paper (including Table 6 and Figure 9) can be generated by the result analysis tool introduced below after the execution of the script. Detailed descriptions/comments can be found in the script. The script can be executed by following command:
$cd /DeepFaultLocalization/RunDeepFL/
$./run_deepfl.sh
Since running DeepFL from scratch can be extremely time-consuming, we also provide result analysis scripts to generate the corresponding table/figures in the paper directly by analyzing the cached results from our prior runs. Specially, we prepared a cached result directory /DeepFaultLocalization/CachedResults to avoid time-consuming DeepFL re-execution by the following commands.
$cd /DeepFaultLocalization/ResultAnalysis
$python result_main.py $RQ $epoch ../CachedResults ../where each parameter can be explained as follows:
$RQ: the corresponding RQ in the paper. There are 4 RQs with 5 kinds of results in total, i.e., RQ1 (Table 4), RQ2 (Table 5), RQ2_2 (Table 6), RQ3 (Figure 8) and RQ4 (Figure 9).$epoch, corresponding epochs in the paper. Typically, RQ1, RQ2 and RQ2_2 should be 55 while RQ3 and RQ4 is 60 according to the paper.- result directory (we use the prepared results here)
- The path of the parent directory including all the datasets
RQ1, RQ2 and RQ2_2 output the corresponding tables in csv format in the current directory /ResultAnalysis while RQ3 and RQ4 output the corresponding pdf files, which are located in sub-directory /ResultAnalysis/Rdata.
Please kindly note that all the feature extraction process are computed via prior work, and are not the contributions of this work. This work’s main focus is to propose a general deep-learning framework that can learn the most effective existing/latent features from off-the-shelf fault localization techniques. We have included all the scripts and instructions for feature extraction for sake of completeness. We have also included an example project (Chart-1). Please kindly note that including all subjects for raw feature extraction can incur huge overhead for the reviewers to update the docker due to the large-scale data/experiments.
The commands to extract feature values of Chart-1 can be illustrated as follows:
$cd /DeepFaultLocalization
$sh install_dep.sh #install neccessary depdencies
$cd /DeepFaultLocalization/FeatureExtraction/Scripts
$sh getRawFeatures.sh /DeepFaultLocalization/FeatureExtraction/ Chart 1
$sh getFinalFeatures.sh /DeepFaultLocalization/FeatureExtraction/ Chart 1
The final feature values are stored in the directory /DeepFaultLocalization/FeatureExtraction/FinalFeatures/$feature_type/Chart/1.txt, where $feature_type represent ''Complexity'', ''Mutation'', ''Spectrum'', and ''Textual''. The format of each line of the result files is as follows: $MethodName,$FeatureValue1,$FeatureValue2,..., $FeatureValueN.
Please note that script getRawFeatures.sh is used to extract all raw feature data from various tools, such as PIT or Indri, and it may take 8-10min on docker. During running PIT, there are many exceptions printed on screen. This is normal for mutation testing (since some mutants can cause test exceptions/crashes by design) and does not affect our results. Also, please note that the some necessary Maven dependencies for the underlying PIT mutation tool may be blocked in some regions/countries(e.g., China).
Our project code structure is displayed as follows.
├── RunDeepFL
│ ├── bidirectional_rnn.py
│ ├── config.py
│ ├── fc_based_1.py
│ ├── fc_based_2.py
│ ├── input.py
│ ├── main.py
│ ├── multilayer_perceptron_one_hidden_layer.py
│ ├── multilayer_perceptron_two_hidden_layer.py
│ ├── run_deepfl.sh
│ ├── utils.py
├── ResultAnalysis
│ ├── result_analysis.py
│ ├── result_conf.py
│ ├── result_main.py
│ ├── result_utils.py
│ ├── RforRQ3.r
│ ├── RforRQ4.r
│ ├── LIBSVMResult
├── FeatureExtraction
│ ├── FinalFeatures
│ ├── RawFeatures
│ ├── Scripts
│ ├── SubjectExample
│ ├── UsefulData
│ ├── UsefulTools
├── CachedResults
├── DeepFL
├── CrossDeepFL
├── CrossValidation
The major scripts can be divided into three parts: scripts for training, scripts for result analysis and scripts for feature extraction, which are located in directories /DeepFaultLocalization/RunDeepFL , /DeepFaultLocalization/ResultAnalysis and /DeepFaultLocalization/FeatureExtraction respectively. The details of all the scripts and directories are introduced as follows.
-
./RunDeepFL: the directory to train and infer the model.-
main.py: the main script for training the model. A detailed example and explanation will be introduced later. -
config.py: the configuration script which handles the arguments specified by commands and pre-defines certain configurations such as learning rate, batch size and dropout rate (if necessary). Detailed descriptions/comments can be found in the script. -
input.py: the input script which can read and parse data from input files for DeepFL training. Detailed descriptions/comments can be found in the script. -
multilayer_perceptron_one_hidden_layer.py: the MLP script which builds the model architecture of MLP and provides an interface to train the model. Detailed descriptions/comments can be found in the script. An interface usage example can be found in line 37 ofmain.py. -
multilayer_perceptron_two_hidden_layer.py: the MLP2 (Multi-layer Perceptron with two hidden layers) script which builds the model architecture of MLP2 and provides an interface to train the model. Detailed descriptions/comments can be found in the script. An interface usage example can be found in line 39 ofmain.py. -
bidirectional_rnn.py: the BiRNN script which builds the model architecture of the basic BiRNN and provides an interface to train the model. Detailed descriptions/comments can be found in the script. An interface usage example can be found in line 35 ofmain.py. -
fc_based_1.py: the$MLP_{DF L}(1)$ script which builds the model architecture of the first tailored MLP variant and provides an interface to train the model. Detailed descriptions/comments can be found in the script. An interface usage example can be found in line 41 ofmain.py. -
fc_based_2.py: the$MLP_{DF L}(2)$ script which builds the model architecture of the second tailored MLP variant and provides an interface to train the model. Detailed descriptions/comments can be found in the script. An interface usage example can be found in line 43 ofmain.py. -
run_deepfl.sh: the script which can finish all training process evaluated in our paper. Detailed descriptions/comments can be found in the script.
-
-
./ResultAnalysis: the directory for result analysis.-
LIBSVMResult: a folder including the raw datapoints/rankings of all project versions of state-of-the-art techniques (under the directory LIBSVMResult/Ranking) and statistical results such as Top-N, MAR and MFR for comparing these techniques with DeepFL -
result_main.py: the main script to analyze the results after running DeepFL -
result_conf.py: the configuration script to pass arguments toresult_main.py -
result_analysis.py: the primary functionalities to analyze the results -
result_utils.py: some other util functions -
RforRQ3.randRforRQ4.r: the r scripts to generate Figure 8 and Figure 9 in the paper
-
-
./FeatureExtraction: the directory for feature extraction.-
FinalFeatures: the preprocessed feature values used for DeepFL training -
RawFeatures: the raw feature data extracted from various tools -
Scripts: the scripts to extract and process feature values -
SubjectExample: the example benchmark projects (i.e., Chart-1) -
UsefulData: other necessary data used for training, e.g., the buggy method(s) and the failed test(s) for each subject -
UsefulTools: various tools to extract different features
-
-
./CachedResults: the cached raw results for quick result analysis without time-consuming retraining. Specifically, the results under multiple argument settings are cached in./CachedResults/$subject/$version/$scenario/$model-$loss-$epoch. For example,./CachedResults/Time/1/DeepFL/dfl1-softmax-1is the result of running the first epoch under the following command:$cd /DeepFaultLocalization/RunDeepFL/ $python main.py .. ../Results Time 1 dfl1 DeepFL softmax 2 1 -
./DeepFL: the raw input dataset for DeepFL within-project training. This data set is organized by path./$subject/$versionwhere$subjectrepresents the subject name and$versionrepresents the corresponding version. In each version, there are four files "Test.csv","TestLabel.csv","Train.csv", "TrainLabel.csv". In Train.csv and Test.csv, each line represents the feature values of one method extracted from Defects4j and each line in TrainLable.csv and TestLabel.csv represents if the corresponding method is buggy or not. -
./CrossDeepFL: the raw input dataset for DeepFL cross-project training. In the cross-project scenario, the test data is the same as in the within-project scenario. Please kindly note that we use other project information as the training data to predict one project in this scenario. Therefore, there is only one training file and label file for each project under this directory. For example, LangTrain.csv and LangTrainLabel.csv are training data and label data for all versions of Lang project. -
./CrossValidation: the raw input dataset for DeepFL cross-validation training. In this directory, there are 10 sub-directories in directory "10fold" used for training. Each has similar files with "DeepFL" and "CrossDeepFL" such as "Train.csv" and "Test.csv". However, since we randomly divide all instances of all projects into 10 different sets, we have to record which version one instance belongs to. Thus, we use "PandV.txt" to store the Project/Version information for further result evaluation.