Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
90 views16 pages

Global XAI Methods Survey

This document provides a survey of global interpretation methods for explaining deep neural networks. It begins with an introduction explaining the importance of interpretability in deep learning models. The document then reviews different terminology used in explainable artificial intelligence before providing a taxonomy of global interpretation methods. It evaluates the strengths and weaknesses of these methods and assesses challenges to their practical implementation. The paper concludes by discussing future research directions to address challenges in global interpretability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views16 pages

Global XAI Methods Survey

This document provides a survey of global interpretation methods for explaining deep neural networks. It begins with an introduction explaining the importance of interpretability in deep learning models. The document then reviews different terminology used in explainable artificial intelligence before providing a taxonomy of global interpretation methods. It evaluates the strengths and weaknesses of these methods and assesses challenges to their practical implementation. The paper concludes by discussing future research directions to address challenges in global interpretability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Neurocomputing 513 (2022) 165–180

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Survey paper

Explaining deep neural networks: A survey on the global interpretation


methods
Rabia Saleem a,⇑, Bo Yuan b,⇑, Fatih Kurugollu a,c, Ashiq Anjum b, Lu Liu b
a
School of Computing and Engineering, University of Derby, Kedleston Rd, Derby DE22 1GB, UK
b
School of Computing and Mathematical Sciences, University of Leicester, University Rd, Leicester LE1 7RH, UK
c
Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates

a r t i c l e i n f o a b s t r a c t

Article history: A substantial amount of research has been carried out in Explainable Artificial Intelligence (XAI) models,
Received 18 May 2022 especially in those which explain the deep architectures of neural networks. A number of XAI approaches
Revised 30 July 2022 have been proposed to achieve trust in Artificial Intelligence (AI) models as well as provide explainability
Accepted 15 September 2022
of specific decisions made within these models. Among these approaches, global interpretation methods
Available online 23 September 2022
Communicated by Zidong Wang
have emerged as the prominent methods of explainability because they have the strength to explain
every feature and the structure of the model. This survey attempts to provide a comprehensive review
of global interpretation methods that completely explain the behaviour of the AI models. We present a
Keywords:
Artificial intelligence
taxonomy of the available global interpretations models and systematically highlight the critical features
Deep neural networks and algorithms that differentiate them from local as well as hybrid models of explainability. Through
Black box Models examples and case studies from the literature, we evaluate the strengths and weaknesses of the global
Explainable artificial intelligence interpretation models and assess challenges when these methods are put into practice. We conclude
Global interpretation the paper by providing the future directions of research in how the existing challenges in global interpre-
tation methods could be addressed and what values and opportunities could be realized by the resolution
of these challenges.
Ó 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).

1. Introduction [17] and healthcare [18–20]. The DNNs use high performance com-
putational resources to train multiple hidden layers and millions or
Machine Learning (ML) has been central to AI research, as it has billions of parameters that vigorously perform many crucial tasks
the ability to find patterns and categorise things. Deep Learning with the best accuracy. However, the computation process of these
(DL) is the subset of ML that is mainly involved in the construction DNNs models is opaque to human beings, so generally, these DNNs
of the deep architectures known as deep neural networks (DNNs). models are referred to as black box models [21]. We cannot explain
From the past few years, the DNNs architectures have been fre- the decision making process of these deep neural architectures
quently used in many computer-vision tasks such as action recog- leading to serious questions on the trust and transparency of these
nition [1], motion tracking [2], and object detection [3]. These tasks models.
are performed by using various deep architectures such as convo- The lack of transparency within deep neural architectures
lutional neural networks (CNN) [4], deep Boltzmann machines [5], restricts the deployment of such models especially in healthcare
and deep belief networks [6]. DNNs have been extensively used in and safety critical applications where a small possibility of the
numerous critical applications such as audio processing [7], auton- wrong decision could damage human life [22]. Therefore, an
omous vehicles and robots [8], autism spectrum disorder [9–11], understandable explanation of the set of instructions behind every
signal analysis [12,13] ophthalmology [14–16], cyber-security decision made by DL models is highly in demand. Many research
papers have been published in the past few years that discussed
the explainability issue of AI models. The explainability of the
Abbreviations: AI, Artificial Intellegence; DNNs, Deep Neural Networks; ML, black box models has received so much importance in recent years
Machine Learning; XAI, eXplainable Artificial Intellegence.
⇑ Corresponding author. that eXplainable Artificial Intelligence (XAI) has emerged as a
E-mail addresses: [email protected] (R. Saleem), b.yuan@leicester.
specific domain within AI [23].
ac.uk (B. Yuan).

https://doi.org/10.1016/j.neucom.2022.09.129
0925-2312/Ó 2022 The Author(s). Published by Elsevier B.V.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

The purpose of XAI is to develop a simple, clear but logically 2. Terminologies


explainable model that describes the inner functionalities of the
black box models. The explanations produced by XAI should be One of the major issues while discussing XAI is the use of termi-
understandable by human beings while maintaining high perfor- nologies that are used interchangeably and complicate the under-
mance in terms of prediction accuracy. The generalised additive standing of concepts. Before diving into the deep ocean of the XAI
models (GAM) were initially introduced to explain the black box field, we attempt to clarify the differences between the commonly
nature of the ML models by using a smooth function [24]. Later, used XAI concepts and terminologies and try to present a consis-
a visual (tree-like graph) algorithm, decision-tree, was developed tent version of (Fig. 1) their similarities and differences.
to provide a conditional and individual explanation of decisions
[25,26]. Individual Conditional Expectation (ICE) explained the  White box: If there is complete information about the architec-
change in predictions with respect to the features [27]. A limited ture and parameters of a model, it is known as the ‘‘white-
version of ICE, the Partial Dependence Plot (PDP) [28], has been ini- box” model. This type of model is considered immensely helpful
tially put into practice to globally explain the nature and effect of for endorsing trust, however, most of the time the amount of
only one or two features of the model prediction. The classification information is not adequate enough to explain the logical ratio-
models such as k-NN and SVM were explained by a set of explana- nale behind the decisions.
tion vectors but these techniques could only explain the outcome  Transparent box: If a model can explain its design, parameters,
of one instance [29]. or algorithm on its own and this justification is good enough
The above discussion indicates that initially modest-size ML for the end-user, then the model is named as the
models were explained by using different explainability meth- ‘‘transparent-box” model. One can check, evaluate, and improve
ods. However, a significant number of models use DNNs that the predictions by this transparency.
have attained much importance in high-risk applications. The  Black box: A model with hidden and veiled architecture and
DNN models with high accuracy demand a better explanation parameters leads to an unknown process of decision-making,
that would lead us to produce more responsible and trustworthy resulting into a ‘‘black-box” model. Generally, the DL models
AI systems [30]. Many local and visualisation techniques have are black box models because their deep architecture makes
been recently developed that help the AI experts to understand them opaque [21]. Next, we define the three key terms that
the decision procedure of the DNN model [31]. For example, have been used as substitutes for one another in the XAI field.
De Graaf and Van Mulken [32] proposed a solution that imitates Hence, it becomes more difficult to differentiate these ideas.
the intermediate process, connects the decision with the given  Understandability: It refers to the understanding of the model’s
piece of information, and is understandable by end-users. This characteristics, features, and function without knowing the
approach however only provides a local explanation of the learn- internal process and procedure involved in the decision making
ing model. [33]. This term provides an answer to the question ‘‘How the AI
This survey paper aims to provide a detailed state of the model works?.”
explainability methods available to AI researchers and practition-  Explainability: It involves explaining the internal process and
ers as well as highlights the strengths and weaknesses of the global answering the question ‘‘How does the black box model of AI
XAI methods that have been developed in the past ten years. Our make certain decisions?.”
contributions can be summarised as follows:  Interpretability: It entails the understanding of the internal func-
tionality and characteristics of the model. This terminology pro-
1. A comprehensive overview of the existing approaches used to vides a meaningful, clear, and logical reason(s) for the specific
globally explain the black box models of DNNs. have been decision in a manner that is understandable to the targeted cli-
described, particularly a critique of the visual and local XAI ent. This term answers the question ‘‘What is the decision of the
methods used in the global explanation of the black box mod- AI model and Why?.” Research communities often can classify
els of DNNs has been provided. The latest and highly cited the explanation of AI models into the following three categories
research papers are picked that have been published in based on usage and scope.
renowned journals and conferences over the past ten years  The Complete Explanation includes all potential features and
and highlight recent developments in explaining the global facts while explaining the decision of AI models, while in Com-
XAI approaches. pact Explanation the decisions of the AI models can be explained
2. In order to systematically analyse the global XAI methods, a by a limited number of factors. Mostly available XAI methods
taxonomy has been introduced on the basis of ante-hoc and for the black box of DNNs models provide the compact explana-
post hoc approaches, which provides clear recommendations tion of DNN models for some particular instances [27].
on when to use a particular approach.  Some explanation approaches only explain the limited number
3. This paper highlights gaps in the global XAI methods and offers of neural networks, for example, the explanation of the linear
a way forward for the future direction of work by proposing a model by the regression weights. The limited ability to explain
deterministic XAI model that will help to explain and address a certain type of black box model is known as Model Specific. The
the existing gaps. other approach is Model Agnostic which can explain each cate-
gory of ML models. The model agnostic explanation tools are
The structure of this paper is summarised as follows. Section 2 preferable because of their flexibility, however, they have no
categorises terminologies that have been frequently and inter- access to the inner information of the model [34].
changeably used in the field of XAI. The knowledge of the XAI ter-  The Ante-hoc explanation scheme can capture all information
minologies enables us to understand the three dimensions of XAI from the input layer to the hidden and output layers of the
methods that have been critically evaluated in Section 3. Section 4 given model [35] however, the Post-hoc scheme only highlights
presents a comprehensive review and taxonomy of the available the route of a particular outcome. For example, the decision-
methods for the global interpretation of DNNs. Section 5 provides tree method can explain the whole model but the LIME method
a summary of available Global XAI methods. Lastly, Section 6 con- can only explain the process behind the particular outcome of a
cludes the papers with current research gaps and future ML model [36]. Hence, similar to the model-agnostic approach,
directions. the post hoc methods are more flexible and easier to apply on

166
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Fig. 1. Classification of XAI methods and their terminology map.

different models than the ante-hoc schemes. Many other terms 3. Dimensions of XAI Problem
have been frequently used interchangeably instead of XAI. Some
prominently famous terminologies are mentioned and As discussed earlier that for critical tasks, just predictions from
explained below. AI models are not enough. These models should be able to explain
 Ethical AI and Responsible AI: Both terms are very much associ- the whole functionality of the black box model that would eventu-
ated with the XAI field. An AI model is said to be ‘‘ethical” if it ally help to explain every reason behind each prediction (Global
does not break any defined rule or regulation by its user during interpretation) or at least explain the reason behind a single pre-
the whole decision-making process. The implementation of ‘‘re- diction (Local interpretation). An explanation process can be
sponsible AI” in real-world applications demands privacy, fair- divided into two parts (i) Extract information (ii) Exhibit informa-
ness, and ethics together with the explanation of the AI model tion. The exhibition of information has extraordinary importance
[37]. as it could directly connect the system with the novices as a client
 Trustworthy AI and Safe AI: An AI model is said to be ‘‘Trustwor- [33]. Besides the above two dimensions of explaining the black box
thy” if a user can anticipate the performance of the model pos- model, another approach is also in demand which visually inspects
itively. Trustworthiness is one of the primary goals of the XAI the model and explains the reason behind the prediction.
that would lead to another goal called Safe AI. The ‘‘Safe AI” Various high-performed approaches and methods for explain-
has the potential to control the chances of unexpected decisions able AI have been developed in the past ten years. Mostly complex
to minimise the risk of unintended harm during the interaction and deep architectures of neural networks would have a post hoc
of systems and humans [38]. explanation that interprets certain predictions. However, there
are few ante-hoc methods with the limited capacity of explaining
The answer to the question ‘‘How can you accurately explains the simple and small-sized AI models. The following section logi-
the black box of AI model?” should be given by introducing an cally differentiates these approaches and discusses some promi-
evaluation criterion for explainable methods. The following two nent methods developed in the past decade.
are the main evaluation criteria for the validation of explainable
methods.
3.1. Outcome Explanation or Local Interpretation
 Qualitative Evaluation: In this evaluation, the measure of XAI
methods depends on the satisfaction of end-user inquisitive- The outcome explanation, also referred to an instant-wise
ness, safety, understanding, and usability [39]. This evaluation explanation, aims to explain the reasons behind a single prediction
has importance as it provides the feedback of explainable meth- using a specific set of input–output (Fig. 2). Although this explana-
ods from the point of view of human understanding and tion is not considered suitable for the non-experts, most methods
usability. of outcome explanation would help the AI experts to scrutinize
 Quantitative Evaluation: This is used if the evaluation measure many edge cases of ML models. The understanding of decision
emphasises the performance of the explainer that how closely routes for the edge cases such as prediction of an autonomous
the explainer mimics the black box model [40]. This approach vehicle to react to an unusual behaviour of cyclist, pedestrian or
can validate the explainable methods by quantitatively evaluat- any object on road, are becoming more important to validate the
ing the outcomes of the AI model for all or some specific tasks. response of vehicle in every situation [36].

167
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Fig. 2. Three dimensions (Global, Local and Visual) of XAI methods. Visual interpretation source paper [58].

Precisely, an explainable function, say ”OE” is generated that Activation Mapping (Grad-CAM) was developed that highlights the
explains the given mapping NN : DðX A Þ ! OðY A Þ for a particular key features of the last convolutional layer of CNN by using gradi-
instance A in the real-valued matrices (or vector). Note that in ent information [47].
the above mathematical mapping, NN is the neural network apply- Local Interpretable Model Explanation (LIME) was presented in
ing to dataset D to produce O output. Initially, the outcome expla- the year 2016 to explain the outcomes of DNN models that would
nation methods established a framework to understand the be interpretable in common man language [40]. The ultimate idea
significance of features and their relations by using feature impor- is to fit a surrogate model such as Linear regression or Ridge
tance matrices, heatmaps, Bayesian and rule-based techniques. regression [48] on the perturb input dataset and generate the local
This understanding has been further improved by designing game explanation. A binary vector x 0; 1 used to assure the absence and
theory and graph-based models and attribution maps [41]. Activa- presence of superpixels that would be captured with the help of
tion maximisation (AM) [42] is an outcome explainer method that LIME from the input image. Different versions of the LIME method
explains the convolutional neural network (CNN) by highlighting have been developed to explain the various kind of models. For
layer-wise feature importance. This method was introduced by example, Sound-LIME (SLIME) is the extended version of the LIME
Erhan et al. in 2010 suggesting an optimisation problem to max- method used to explain the deep voice detector’s predictions by
imise the unit activation by considering the input patterns. For time–frequency and temporal segmentation [49], Modified Per-
fixed parameters, the activation mapping of i
th
unit from j
th
layer, turbed Sampling operation (MPS-LIME) uses clique operation for
Z i;j ðX; /Þ can be defined as following: picking superpixels that reduce the run time because of the low
number of perturbed samples [50] and KL-LIME uses Kullback Lei-
X  ¼ argmaxðX;X¼qÞ Z i;j ðX; /Þ ð1Þ bler divergence to explain the prediction of Bayesian model [51]. A
similar method known as Shapley Additive explanation (SHAP) was
In 2013, Simonyan [43] presented saliency maps, which is proposed by Lundberg and Lee to enlighten the importance of
another way of explaining the output by summarising the pixel the individual part of input data while explaining the prediction
importance of the input image. This approach uses a perturbation [52]. As described in Eq. 2, SHAP explains the prediction by sug-
method and computes the gradient of input images to create resul- gesting an additive model with M simplified and alliance features,
tant feature maps. By modifying the input data, perturbation-based where z0  {0,1}M is alliance vector, and /k  R is the k
th
feature
methods discover the changes in output of the DNN model, where attribution.
gradient-based methods can detect very tiny changes in the input
data. Pixels in an image, words in text, or columns in a table can be X
M
perturbed by applying an occlusion mask, blurring or replacing gðzÞ0 ¼ /0 þ /k ðz0k Þ ð2Þ
parts of images, replacing a word with a synonym in text, or shuf- k¼1

fling or inserting rows/columns in tables. It is important to choose The contribution of each feature with the sum of bias is repre-
these changes in input data very carefully to get better results. sented as gðzÞ0 Þ in the above equation. Similar to the LIME method,
These maps can be sharpened by SmoothGrad that randomly per- SHAP also has different versions such as Low-Order SHAP, Kernel-
turbs the input and presents the result as the average of the resul- SHAP, and Deep-SHAP [53]. Due to the better performance of SHAP
tant maps. Layer-Wise Relevance BackPropagation (LRP) [44] is an as compared to the other methods, there is an extensive use of this
outcome explanation method for the DNNs introduced in 2015. approach in the medical field [54].
This method decomposes the output of the DNN model and finds
out the relevance scores of each feature in the given input data.
The LRP method is widely used to explain outcome predictions of 3.2. Visual Explanation or Model Inspection
many DNN models such as Convolution Neural Network (CNN)
and Recurrent Neural Network (RNN) from the past few years. The inspection approach aims to provide the textual or visual
However, many new researchers use this technique to prune the explanation of many ML models that would help to understand
network system by understanding the attribution of each layer the reason for the model’s prediction (Fig. 2). A survey paper on
[45]. A method, called Class Activation Mapping (CAM), creates the DNN visualisation techniques published in 2017 mentioned
heat maps that highlight those parts of input which are responsible that most researchers use pixels to display their research outcomes
for an outcome [46]. Based on this theory, Gradient- weight Class [55].
168
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

This paper classifies the inspection methods in four categories health care was published in 2015 as the application of refining
based on feature importance (i) Visual explanation with Activation GAM [73]. GoldenEye is an iterative method that explains the
Maximisation (AM) (ii) Visual explanation with Partial Dependence model by grouping those features whose interaction produces
Plot (PDP) (iii) Visual explanation with Decision Tree (iv) Visual results [74].
explanation with Sensitive Analysis. The Feature Importance Ranking Measure (FIRM) is another
As discussed earlier that the area or pixels responsible for the method that finds the most relevant features by correlating the
decision can be highlighted by using the Activation Maximisation structure of features [75]. FIRM is the extension of the method
(AM) technique. Yosinki et al. proposed tools for visual explanation called Positional Oligomiter Importance Matrices (POIMS) that
and interpretation of the DNNs such as CNN. One of them explains uses a scoring system to rank and visualise each K-mer [76]. The
the computation process at the intermediate layers of CNN and the extended version of FIRM, named, Measure of Feature Importance
other tool highlights the active part or feature of each CNN layer (MFI) is the non-linear classifier that detects those unobtrusive fea-
during the training process [56]. Another similar method discussed tures whose interaction with other features impacts the outcomes
in [57], traces back the computation process to detect those parts [77].
of the image responsible for neuron activation. The Deconvolu- Before discussing the global interpretation method, the Table 1
tional approach is used to visualise the intermediate layers of the provides a quick view on the interpretation tools that are used to
simple CNN model [58]. Many methods and their variants were explain the AI models at the global level. Based on the model usage
proposed in different papers for a visual explanation of DNN [59– and type, we also interpret these tools to create white-box or to
61]. Partial Dependence Plot (PDP) is a manner to visualise the rela- enhance the fairness in AI models. The next section synthesises
tionship between different neurons in the feature space. In [27,62] the global model explanation methods, proposed in the past ten
extension of PDP was applied on the tabular dataset which was years.
able to evaluate and visualise the interaction between the neurons
and the relationship between the feature and prediction. 4. Methods for Model Explanation
A relevant method known as Tree View uses the Decision Tree to
visually explain the connection between the decomposed K- clus- The AI explanation methods can be categorised (Fig. 1) on
ter of model features [63]. The Quantitative Input Influence (QII) is Scope: Does the XAI method explain the whole model or only a cer-
another method that captures the information about the input fea- tain outcome (local or global)? Approach: What is the focus of the
tures and used decision tree approach to explain the prediction algorithm, input data, or model parameters (backpropagation or
[64]. The uncertainty of the input–output pair can be studied under perturbation)? Usage: The developed method can explain any type
the measure of sensitivity. Initially, the Sensitivity Analysis (SA) was of model architecture or just be applicable to the specific architec-
used to understand the mechanism of a neural network on the ture (post hoc or ante-hoc)? This paper focuses only on global
basis of sensitivity and the Neural Interpretation Diagram (NID) interpretation methods based on model usage (Fig. 3).
was used to confiscate the non-essential parts and connection of
the neural network [65]. Based on sensitivity analysis, the Gaussian 4.1. Ante-hoc methods
Process Classification (GDP) explain and visualise the local outcome
by the explanation vector x [29] and the Variable Effect Character- Ante-hoc methods are mostly model-specific and this might be
istic curve (VEC) draws the bar plots between the features and their seen as a drawback because they consider only a limited number of
response based on the importance of features [66]. models. In many papers, this explanation is also known as intrinsic
The above discussion indicates that some explanation methods explanation. Hence by definition, model intrinsic explanation
prefer to use different visual tools such as heatmaps and salience methods depend on a certain design and cannot be used again
masks to display the information. During this type of explanation, for any other architectures. This section provides a detailed discus-
two dimensions have been used for the interpretation of the black sion on those ante-hoc methods which have been established in
box model, namely local interpretation, and global interpretation. the past ten years. A quick view of ante-hoc methods are given
As the main focus of this survey, is the global interpretation meth- in Table 2 and Table 4 that summarise information about type of
ods so next section has a detailed discussion on this state of the art. data, frameworks and methodologies, as well as their merits and
demerits.
3.3. Model Explanation or Global Interpretation
Definition 1. Let the DNN represent as a function F : RD1 ?RDk with
One way to give details about the black box of the AI models is
an input x RD1
. Another representation of x could be ante-hoc
the global interpretation. This type of explanation describes the
complete logic of all the outcomes by mimicking the behaviour explanation, eXpðF; xÞ RT such that T6D1 .
of the black box model (Fig. 2). The inside view of a model would
help to understand the nature of model features and their correla-
tions that leads to the outcomes. Many model explanation meth- 4.1.1. Bayesian Case Model (BCM)
ods divide the whole model into parts to make it easier for the Studies show that just revealing of rules behind the decisions of
explanation. Tree-based and Rule-based models are inherently AI models are not enough for achieving user’s confidence, the
fully explainable. The black box models consisting of the neural example-based reasoning and interpretation improve the level of
network were initially explained by approximating a single tree confidence significantly. The case-based reasoning (CBR) is consid-
[67,68] and rule extraction [69]. In [70], an approach REFNE has ered exemplar-based modelling that involves the most effective
been introduced to interpret generated instances from the trained tactics such as matching and prototyping as humans like to look
neural network by extracting symbolic rules. at examples rather than recommendations. For example, the natu-
In recent years, the approaches that explain any type of AI ralistic studies mentioned that the decision-makers in a fire service
model (agnostic) are in high demand. The Generalised Additive use recognition-primed decision making that matches new situa-
Model (GAM) is the first attempt toward the model-agnostic tions with similar cases and decides appropriate manoeuvres to
approach that explains regression splines, tree-ensembles, and handle the situation [87]. Therefore, with the knowledge of CBR
single-trees by highlighting the contribution of each feature [71]. new situations can be represented successfully by analysing the
Same researchers refined this method in [72] and a case study on previous situations.
169
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Table 1
Global Interpretation Tools to create White-box (W) or to enhance Fairness (F) in AI models.

Model type (Scope) Year Data type Category Interpretation tools


Ante-hoc (Model Specific) 2010 Tabular F Fairness-Comparison[78]
2015 Tabular W Interpret ML [73]
2016 Tabular W Slim [79]
2019 Tabular W AI-360 [80]
2019 Tabular F ML-Fairness Gym [81]
Post-hoc (Model Agnostic) 2017 Tabular F AIF-360 [82] and Fair Classification [83]
2018 Tabular F Fair Learning [84]
2018 Tabular F AI-360 Gerry Fair [85]

Fig. 3. Workflow of Post-hoc and Ante-hoc Global XAI methods.

Table 2
Summary of research papers published in the past ten years for the Global explanation (Ante-hoc) of AI models

Model usage (type) Year Methods Data type Methodologies Explanation medium Frameworks XAI evaluation
Ante-hoc (Model-Specific) 2014 BCM [75] Any Perturbation-based Multimedia Python (PYMC) Qualitative
2015 GAM [76] Tabular Perturbation-based Graphics (heatmaps) R (PyGAM) Qualitative
2015 BRL [86] Tabular Rule-based Textual Python Quantitative
2020 NAM [70] Image Cluster-based Graphics (heatmaps) Pytorch Quantitative

Fig. 4. Ante-hoc explanation by Bayesian Case Model.

A new CBR based unsupervised Bayesian model, ‘Bayesian Case clusters. For i
th
observation, pi denotes the mixture weight over a
Model (BCM)’ [86], was introduced in 2014 that learns about the th
cluster and xij indicates observation for j feature as each observa-
notable features to create prototypes that produce accurate and
tion has F features. Each xij comes from a cluster denoted as zij and
interpretable outcomes on the standard datasets. To understand
Z is the full set of the clusters formed by the observation-feature
the generative process of BCM, some mathematical notations used
pair. All hyperparameters such as k; a; q and c are fixed that speci-
in the BCM flow diagram (see Fig. 4(a)) such as xi ,i ¼ 1; 2; . . . ; N
fies how much we can copy a prototype to make explanations. B.
random mixture over cluster with N observations and S known

170
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Kim and her team members divide the explanatory process of BCM healthcare problems: 30-day hospital readmission and pneumonia
into three parts: risk. In the 30-day hospital readmission, a large dataset was used
with 3956 features such as patient 313 history, doctor’s prescrip-
(i) Prototype ðps Þ: is a classic observation in x that is used to rep- tion, notes and recommended lab reports. The intelligible model
resent a cluster say s. For some i and element j; psj ¼ xij that provides a risk score and sorts the important features according
maximizes pðps jws ; z; xÞ, where ws is a feature indicator that to the doctor’s requirement. On the other side, the dataset involved
we discussed below as a next step of BCM explanatory in a pneumonia case study is much smaller than the 30-day hospi-
process. tal readmission task and only 46 features participate in predicting
(ii) Feature indicator ðws Þ: is a binary vector ws  f0; 1gQ of size Q the death rate from this lung disease. The GA2Ms model predicts
that activates the important features to characterize clusters the death rate by learning and editing all possible patterns that
and prototypes. This vector also indicates the presence of would be abandoned even in complex ML models. Hence the com-
feature j in subspace s. bination of standard GAMs and GA2 Ms. are more understandable
(iii) Feature outcome distribution ð/s Þ: /sj is a vector that explains because the unambiguousness of the model outcome can be visu-
the discrete probability distribution of possible outcomes for alised by a heat map. The above studies conclude that the GA2 Ms.
feature j of length U j , where U j is the number of possible balanced the trade-off between accuracy and interpretation very
outcomes. well and show practicable accuracy as compared to many ML mod-
els [72].
Fig. 4 demonstrates a graphical representation of the discrete In 2018, a method called CONTENT [89] has been introduced
mixture of BCM and Latent Dirichlet Allocation (LDA) method. that produces context vectors by transforming the patient’s elec-
The authors of the paper [86] used the dataset of a mixture of smi- tronic health record into the clinical concept embedding. This
ley faces to demonstrate each part of BCM and compare its inter- method presented the refined but complex hidden knowledge in
pretability accuracy with another method called LDA. The feature the context vector by distilling the complex relationship between
set (colours, shapes, and types of mouths and eyes) produces three risk factors present in the patient’s record and readmission predic-
clusters and each cluster has two features. The instance of BCM tions. Primarily, the distillation technique transfers knowledge
with the fixed value of hyperparameters produces 240 smiley from an accurate but complex model to a fast and simple model
faces. However, LDA and BCM represent their outcomes very differ- that have same level accuracy. The distillation approach was also
ently but note that in both approaches selection of important fea- used to explain ICU outcomes by highlighting the important
tures is the same for each cluster (Fig. 4(b)). It is also worth learned features that transfer from complex model to simple
mentioning that the interpretability of BCM was also verified by model. Many other medical applications such as diabetes classifi-
performing the human subject experiment in which participants cation and breast cancer used knowledge distillation approach to
need to understand the formation of clusters for the recipe dataset build the interpretation prediction models [90]
without any training. In order to evaluate the effectiveness of the
learning process, twenty-four participants were divided into two 4.1.3. Bayes Rule List (BRL)
groups and asked to complete the BCM and LDA questionnaires Mostly rule-based models like decision lists and decision trees
consisting of eight questions each. As an explanation, LDA provided are inherently interpretable and many other explainable
a cluster of top ingredients for each recipe, while BCM presented approaches used them as a part of their algorithms. In 2015, pre-
prototype ingredients without noting the recipe name or subspace. dictive models were introduced that are based on the idea of a
The number of top ingredients from LDA is set as the number of decision list and are known as ’Bayes Rule List (BRL)’ [91]. The
ingredients from BCM prototype and perform Gibbs sampling for model of the decision list consists of a series of rule statements
LDA until the ground-truth clusters become identifiable. Results such as ‘if-then’ which automatically explains many obvious rea-
show that the explanation run-through BCM achieved higher accu- sons behind every prediction. BRL is an associative classification
racy (85.9%) than LDA (71.3%) which uses the same Gibbs sampling method that starts the process by producing the posterior distribu-
inference scheme as BCM. tion over permutations of ’if-then’ large but pre-minded set of rules
say, R. If a data set has N observations fxn ; yn g and let r represent as
4.1.2. Generalised Additive Model (GAM) rð:Þ : xn 2 X?yn 2f0; 1g. Let R be a rule set define as
For many years, the Generative Additive Model (GAM) has been 
1 9 r 2 R; rðxÞ ¼ 1
used to explain many ML models however, there is a trade-off RðxÞ ¼ ð4Þ
between the accuracy and intelligibility of these models. In 2015, 0 otherwise
Caruana et al. introduced an intelligible model by integrating stan-
With Rð:Þ classifier, x is classified as positive if it obeys at least one
dard GAMs and another model called GA2 Ms. to improve accuracy rule defined in Eq. 4. It is assumed that the interpretability of rules
[88]. Let’s assume a training dataset of size N, denoted as is associated with the number of conditions or length L of rules that
D ¼ ðxk ; yk ÞN1 , where xk is the feature set with p features and yk is are derived from a set of pre-mined rules R. R = [Ll¼1 Rl as R is
the response. If G is the link function then the pairwise interaction divided into maximum length L that a user allows. In a generative
of GAM is written as BRL model, the decision list and rules are defined with words like
X X ‘if’, ‘else if’, and ‘else’, accumulation of such words gradually clarifies
GðE½yÞ ¼ c0 þ f l ðxk Þ þ f kl ðxk ; xl Þ ð3Þ
the rules which make the model understandable. An accurate deci-
l k–l
sion list can be derived when a pre-minded set of rules is suffi-
To make the model intelligible the contribution of each feature ciently expressive. MarketScan Medicaid Multi-State Database
can be determined by inspecting f l , where E½f l  ¼ 0. Eq. (3) helps to (MDCD) data of 11.1 million patients were used to explain stroke
understand the strategic flow of the above-paired model as fol- chances using the BRL method. Besides extracted features (atrial
lows: (i) build the finest GAMs model (ii) detect all possible pairs fibrillation condition, gender, age), additional information was col-
of interaction (iii) rank all top n-potential pairs. The evolution of lected such as medicines and other medical conditions. This infor-
any interpretable approach in ML can be assumed more valuable mation was used to generate binary predictor variables that
if its performance is validated on critical tasks such as healthcare. confirm the presence or absence of drugs and conditions. The priori
The performance of GA2 Ms. has been validated by discussing two distribution helps to add, edit features and rules to create a sample
171
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Table 3 (NAMs) as an improved, accurate and scalable version of GAMs.


The trustworthy set of rules for predicting future stroke [82]. NAMs can train multiple DNNs and learn a linear combination
Rules and conditions Chances of Credible for each single input feature. The architecture of NAMs for binary
strokes intervals classification is explained in Fig. 5.
If and Hemiplegia Age > 60 then 58.9% 53.8%-63.8% The generalisation of the NAMs method can be achieved by
else if Cerebrovascular disorder then 47.8% 44.8%-50.7% parameterising the function f l , see Eq. 5, in the presence of various
else if Transient ischaemic attack then 23.8% 19.5%-28.4% hidden layers and neurons. Because of the failure of the ReLU acti-
else if Occlusion and stenosis of the carotid then 15.8% 12.2%-19.6%
artery without infarction
vation function, researchers introduced exp-cantered (ExU) hidden
else if The altered state of consciousness then 16.0% 12.2%-20.2% units that help NAMs to learn jagged functions with standard ini-
else if Age > 70 then 4.6% 3.9%-5.4% tialisation. This new class of model should learn jagged functions
else 8.7% 7.9%-9.6% to handle abrupt changes in the datasets relating to real-world
problems.
The unit function for ExU hidden units can be calculated with
Table 4 input (x), bias (b) and, weight (w) parameters as:
Potential merits and demerits of Global (Ante-hoc) XAI methods.
hu ðxÞ ¼ f ðew  ðx  bÞÞ ð5Þ
Ante- Merits Demerits
hoc NAM explains the contribution of each feature from every neu-
methods
ral network by calculating the average value of the shape function
BCM  Ability to capture good  Can not handle uncer- as a positive and a negative value. Positive values increase the class
information to improve tainty with prior probability while negative value reduces the chances. The visuali-
predictions. probability.
 Results are easy to explain.  No correct way to choose a
sation of shape function by a shape plot can help to understand the
 Can achieve more accuracy prior. model and allow to edit the model as well as the dataset before the
than LDA.  High computational cost. final implementation. The interpretation by the NAM method is
GAM  Able to deal with non-linear  Computational complexity beneficial for DL as they used several hidden layers and units. From
and non-monotonic rela- with a high propensity of
these hidden layers and hidden units, one can compute more com-
tionships between the overfitting.
response and the predictor  Python package is not plex but accurate shape functions and allow subnets to learn non-
variables. available. linear functions that are required to improve the accuracy of the
 Can deal categorical model’s intelligibility.
predictions.
BRL  Can handle both continuous  Rules focus on classifica-
and discrete data. tion and almost com-
4.2. Post-hoc methods
 Easy to interpret by high- pletely neglects
lighting relevant features. regression. The existing and pre-trained AI models can be more valuable if
 Fast, robust, and used to  Bad in describing linear the interpretation of their decision path is understandable along
make real-time predictions. relationships.
with the accuracy. The post hoc explanation methods required an
 Only deal categorical
features. algorithm to look inside the black box of any DNNs architecture
NAM  Can learn arbitrarily com-  Great chances of overfit- without losing its accuracy. Due to this key advantage of post
plex relationships between ting with the standard hoc methods, this approach is also known as ‘‘model-agnostic”. A
input feature and the initializer. quick view of post hoc methods are given in Table 5 and Table 6
output.  Produce inconsistent
 Flexible, scalable, and easy results with Relu activa-
that summarise information about type of data, frameworks and
to extend. tion function. methodologies, as well as their merits and demerits.
 Can explain result to larger
community.
Definition 2. Let the DNN represent as a function F : RD1 ?RDk . The
post hoc explanation, eXpðFÞ, consist of two functions G1 and G2
where G1 : Rd1 ?Rd2 represents F such that d1 6D1 and d2 6Dk and
rule list then BRL tries to optimise these rules. Table 3 highlights the
small set of trustworthy rules to predict future chances of stroke. G2 : RD1 ?Rd1 maps the original input to the valid inputs of the
The first three rules are based on other medical disorders such as function G1 .
hemiplegia, cerebrovascular, and transient ischaemic, the chances
of strokes seem remarkably high. In the last three columns, vascular
disease, and age play an important role to predict the future risk of
stroke and chances are comparatively low. 4.2.1. Global Interpretation from local interpretation methods (LIME,
The BRL method is trustworthy because it applies to real med- LRP, SHAP)
ical data where risk is too high as patients with heart disease are so Recently, some researchers extended the existing ideas of local
vulnerable. To assure BRL performance and level of accuracy, it was explanation of AI models and deployed them in a way so that they
compared with CHADS2 score system to predict chances of stroke can be used for global or model explanation. This section discusses
in a patient with atrial fibrillation condition. In 2017, the scalability such methods that have been initially introduced as the local inter-
of the BRL method was enhanced by using improved theoretical pretation methods. Originally, a novel explanation method, ‘Local
bounds and tuned language libraries [92]. Hence, the optimised, Interpretable Model Explanation (LIME)’, was introduced to explain
concise, and reliable rules list generated by the BRL method allows prediction yield by a single instance. However, to solve real-
to communicate with the domain experts and implement ML mod- world problems such as predictions about medical diagnosis, the
els in other fields such as industry, science, and engineering. explanation of a single prediction would not make these models
trustworthy. To make the user more confident while using these
4.1.4. Neural Additive Model (NAM) models, researchers introduced the extended version of LIME
A combination of intelligibility of GAMs and expressivity of called ‘Submodular Pick (SP-LIME) [40]. SP-LIME is used to under-
DNN yields a novel class of model, called ‘Neural Additive Model stand the single data instances to understand the global correla-
(NAMs)’ [93]. One can introduce the Neural Additive Model tions of models.
172
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Fig. 5. The Neural Additive Model for binary classification.

Table 5
Summary of research papers published in the past ten years for the Global explanation (Post-hoc) of AI models

Model usage (type) Year Methods Data type Methodologies Explanation medium Frameworks XAI evaluation
Post-hoc (Model-Agnostic) 2016 SP-LIME[40] Any Perturbation-based Graphics Python/R Qualitative
2015 LRP [86] Image Gradient-based Graphics (heatmaps) Caffe Quantitative
2017 SHAP [52] Any Perturbation-based Multimedia Python (XGBoost) Quantitative
2019 SpRAy [53] Image Gradient-based Graphics Caffe Quantitative
2019 GAA [72] Image Perturbation-based Multimedia Multi- dimensional Quantitative
2019 ACE [94] Image Concept-based Graphics TensorFlow Qualitative

Table 6 which chooses the minimum number of inputs and extracts the
Potential merits and demerits of Global (Post-hoc) XAI method. maximum number of important features.
Post- Merits Demerits The behaviour of the SP-LIME is similar to surrogate models as
hoc it extracts useful information and independent explanation from
Methods the LIME method. SP-LIME would be preferable over LIME because
SP-LIME  Fast implementation and  Inherently generate an it provides a non-redundant and global view of the model to trust
less robust. explanation for local those models. There is another method, called ‘Layer-wise Rele-
 Easy to interpret by expert instances.
and non-expert.  Do not guarantee the opti-
vance BackPropagation (LRP)’, which was initially used to explain
mal solution. the single instance of prediction [94]. The decomposition of predic-
SHAP  Fast implementation.  Slow computation. tion helps to calculate the relevance score for every individual
 Contrastive explanations.  Shapley values can be input feature. Many deep architectures of the neural network such
 Consistent interpretation. misinterpreted.
as CNNs and RNNs use the backpropagation and update the rele-
SpRay  Detect any kind of anomaly.  Only qualitatively evalua-
 Ability to explain complex tion is available. vance scores to explain the single prediction by generating heat
DNNs by highlighting  Heatmaps are sensitive to maps. Recently, LRP utilises these heat maps as input for their glo-
important feature specific features. bal explanation algorithm. Network pruning is another way to use
(heatmaps). LRP that helps to reduce the memory cost of the AI model without
 Low computation and stor-
sacrificing accuracy [45]. The relevance score generated by LRP
age cost.
GAA  Can represent Nonlinear  Computationally expen- highlights the least important features that are eventually
relationship. sive due to large number removed from the model to prune it.
 Provides tunable subpopu- of features. In [49], the ‘SHapley Additive ExPlanation (SHAP)’ method calcu-
lation granularity.
lates Shapley values and explains the prediction on behalf of fea-
 Easy to implement.
ACE  Generate meaningful,im-  Need lot of image segmen- ture contribution towards a certain output. The calculation of
portant, and coherent con- tation processes while Shapley value is based on the concept of coalition game theory,
cept to explain DNNs. generating explanation for where a prediction is treated as ‘payoff, and the value of each fea-
image dataset. ture is assumed ‘player’. Shapley values state the fair distribution
 Can not generate explana-
of payoff (prediction) among the players (features). The mathemat-
tion for the complex
concept. ical formulation for computing contribution of each feature is
given as:

Fig. 6(a) describes the idea of the SP-LIME algorithm which pro- X
M

vides the global explanation by fetching important features from gðzÞ0 ¼ /0 þ /j z0j ð6Þ
j¼1
each instance. If the given set of instances are I, we can choose B
‘Budget’ as the required number of explanations. Firstly, we can Where g is the explanation model, /j is feature attribution for
run the LIME algorithm for the available set of instances and save t
j h feature, z0 2 ½0; 1M and M is the maximum size of the coalition.
the explanation of each instant into the ‘explanation matrix’ say L.
Different versions of SHAP like KernelSHAP, LinearSHAP, and Deep-
The explanation matrix helps to extract the important features of
SHAP were introduced to explain the individual prediction for var-
the given model. The greedy optimisation technique is applied to
ious types of datasets. In [95], a framework, called TreeExplainer,
the new matrix of size IB generated by the SP-LIME algorithm
was introduced as an extension of SHAP for trees. The algorithm
173
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Fig. 6. Global explanation from Local interpretation SP-LIME and SHAP methods.

behind TreeExplainer finds the local interpretation for trees by The paper [54] explains the SpRAy algorithm by implementing
computing the Shapley values, then structures them in such a on the horse images of the PASCAL VOC dataset. Fig. 7 shows the
way so that the model can explain the features at the global level. following four different prediction strategies to classify horses. (i)
Without loss of consistency and accuracy, the extended version of spot the presence of rider and horse (ii) highlight the existence
SHAP, TreeExplainer, provides a quick local explanation for trees in of source codes on portraits-based images (iii) identify some back-
polynomial times. The global interpretations include interaction ground elements (iv) and highlight tags on landscape-oriented
and clustering values, summary and dependence plots, and feature images. On this large dataset, the SpRAy method acts as a semi-
importance (Fig. 6(b)). With SHAP, the global interpretation of the automated tool that can also detect any kind of anomaly such as
model becomes easier due to the fast computing ability of Shapley misuses of source tags in horse images. Hence, without human
values. intrusion, the combination of LRP and SpRAy methods enables
the user to identify the strategies behind prediction and visualises
them with the aid of heat maps.
4.2.2. Spectral Relevance Activation (SpRAy)
The SpRAy technique for explaining the AI models at the global 4.2.3. Global Attribution Analysis (GAA)
level was introduced by Lapuschkin in 2019 [54]. This technique is Although the discussed global interpretation techniques explain
based on the LRP method which explains the model for an instance. the decisions by summarising local attributions or providing a set
To view insight into the model and explain the decision-making of rules, these methods failed to learn about the non-linear interac-
process, a spectral clustering algorithm was applied to the local tions of features across subpopulations during the training process.
explanations of the model produced by the LRP method. The A technique called ‘Global Attribution Analysis (GAA)’ has been
results produced by the LRP method help to spot and analyse those introduced in 2019 that produces explanation even for subpopula-
attributions which appear frequently with spatial structure. This tion by generating global attribution [96]. Each global attribution
spatial analysis would help the SpRAy method to identify any explains the specific part of the model that leads to the global
anomaly in the model. explanation of the model. Fig. 8 shows the workflow of the GAA
The algorithm of the SpRAy technique can be summarized as method. Firstly, the information about the local features is col-
follows. (i) Firstly it uses the LRP method and finds local relevance lected by employing some local interpretation methods such as
maps that are used to explain every data instance. (ii) it then LIME, DeepLIFT and, Integrated Gradient. At this stage, every local
makes the definite and visible solution, scales down the relevance attribution highlights the significant features for a single predic-
map to uniform size and shape (iii) it then evaluates the LRP rele- tion and treats these attributions as weighted conjoined rankings.
vance maps by using Spectral Cluster (SC) analysis to design clus- To avoid anomalies, these local attribution vectors are normalised
ters for the local explanations (iv) and finally it uses eigen maps by
analysis to compute eigen gap among two successive clusters
1
and return relevant cluster to the user. Lastly, as an optional step, jdw j  X ð7Þ
the user can visualise these clusters by using t-Stochastic Neigh- ð jdw ðiÞj
i
bour Embedding (t-SNE).

Fig. 7. The four strategies of SpRAy method to classify horses [54].

174
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Fig. 8. Algorithm for Global Attribution Analysis for global interpretation.

In Eq. 7, dw is a weighted attribution vector and  is used to rep- understood by the visual source provided in [101]. Lastly, the
resent the Hadamard product. Next, the following two options are method uses any method like TCAV [102] to highlight important
used to compare these normalised attributions and quantify simi- concepts by computing concept-based importance scores. This
larities among them: (i) Kendall’s Tau rank distance [97] (ii) Spear- method is only performed on the image datasets as it is easy to
man’s Rho square rank distance [98]. After this comparison, the group pixels in a meaningful way, this could be a big drawback
GAA method uses a clustering algorithm, K-medoid [99], to make of ACE.
clusters of similar attributions and identify global attribution pat-
terns. Hence, GAA allows us to look and find out differences among
explanations of subpopulations. Fig. 8 depicts each step of GAA 5. Summary
algorithms. In addition, GAA also offers a tuneable granularity to
get information about the preferred number of subpopulations. We summarize the taxonomy that has been discussed in the
previous sections and is visually depicted in Fig. 10. Among the
4.2.4. Automatic Concept-based Explanation (ACE) explainable methods discussed above, not all kinds of data can
A concept-based method, called ‘Automatic Concept-Based be processed. There are methods that take into account numerical,
Explanation (ACE),’ has been discussed in [100] that is used to binary, and categorical data, which are tabular, while others gener-
globally explain the trained classification models such as CNN ate explanations by highlighting data that is comprised of pixels,
and Inception-V3. ACE suggests only those concepts for the expla- which are images. Fig. 10 illustrates those objects that are
nations that are indispensable and coherent for a model’s predic- explained within the development process to provide global expla-
tion as well as meaningful and understandable for humans. nations. Some methods focus on accessing the internal representa-
The authors of the paper explained the ACE algorithm step by tion, such as layers, features, or vectors, and others explain how the
step (Fig. 9). They pick a trained classifier with a set of images as model is trained. In this paper, we include only those explanators
input data. In step one, the method extracts all concepts present that certainly contribute to explaining the decision-route gener-
in the images in the form of segments (groups of pixels). The ated by the above-described explainable models. (i) Saliency Map
method then applies different levels of resolution techniques to (SM): is an efficient way to visually highlight and mask the causes
fully capture the hierarchy of concepts. Usually, three levels of res- of certain outcomes [103]. (ii) Decision Tree (DT): is easily under-
olution are considered enough to capture colours, texture, objects, standable, also known as single tree approximation, and primarily
or even their parts. used for the global explanation [104]. (iii) Partial Dependence Plot
In the second step, ACE picks those segments that represent the (PDP): plot the relationship between the outcome of the black
same concepts and puts them in a group. To measure similarity box and the input [105]. (iv) Decision Rule (DR): is the most human
among segments, the Euclidean distance (say d) can be used, this understandable explanation technique that is used to transform
distance also helps to remove the concepts with low similarity the decision tree into a set of rules [96]. (v) Prototype Selection
and maintain the coherency of the model. All these steps can be (PS): consists of returning the outcome with a set of similar

Fig. 9. Step by step Automatic Concept based Explanation algorithm. [101].

175
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Fig. 10. Explanatory Taxonomy of Data and Model Driven Global Explainable methods.

Table 7
Potential merits and demerits of interpretable Explanators.

Explanators Merits Demerits


Saliency Map  Highlight important pixels.  Only qualitative evaluation is available.
 Faster computation.  Insensitive to model and data.
Decision Tree  Easy to explain.  Fail to deal with linear relationships.
 Need less effort for data preparation.  Difficult and expensive to interpret deeper tree.
Partial Dependence Plot  Easy to understand and interpret.  Deal with maximum two features.
 Easy to implement.  Hidden Heterogeneous effect.
Decision Rule  Select only the relevant features.  Difficult and tedious to list all the rules.
 Cost efficient.  Fail to describe linear relationship.
Prototype Selection  Easy detection of missing functionality.  Expensive.
 Detect error at early stage.  Higher number of features or clusters.
Feature Importance  East interpretation.  Expensive.
 Highly compressed and insight model globally.  Time consuming.

instances [106]. (vi) Feature Importance (FI): is an effective but sim- small change in input data can impact the Data Driven explanation
ple explanation solution that highlights and returns weights and so this approach does not need to understand the inner functional-
features with a high magnitude [107]. In Table 7, these methods ity of model [108].
have been summarized with their advantages and disadvantages. Gradient-based Global explainable methods: On the contrary,
Perturbation-based Global explainable methods: The summary gradient-based methods, understand the neurons’ activities by
provided in Table 2 and 5 shows that most global explainable doing more than one forward pass and use partial derivatives of
methods are based on the perturbation algorithm. Perturbation activation to generate attribution representation during backprop-
mainly focuses on perturbing the set of features (e.g. pixels) of agation. Naturally, gradient-based methods such as LRP [91] and
the given input data by masking, occlusion, or filling operations. SpRay [53] generates the human understandable visual explana-
After finding a set of perturbations, a new set of predictions can tion but there is no discussion and evaluation of these methods
be obtained by using the parameters of DNNs. To determine the at the qualitative level to gain trust on the AI models especially
significance of different features, these predictions are compared for applications such as medical surgeries and autonomous vehi-
with the original data and an explanation is generated with the cles. Correlation-based algorithms compute correlation scores
predefined set of explanation rules. Generally, Global methods rather than gradients by using the backpropagation technique.
such as BCM [77], GAM [87], SP-LIME [40], GAA [93], and, SHAP Under the set of constraints, correlation-based methods such as
[52] use only forward pass to understand the neurons’ activities DeepLIFT [109] generate reasonable explanations. DeepLIFT calcu-
and impact of each feature to demonstrate models’ attributions. lates the scores based on a comparison between the values of acti-
Perturbation methods provide a visual explanation (heatmaps, sal- vated neurons and the reference values. In some cases, DeepLIFT
iency maps) to explain the influencing features of images, videos, may consider both negative and positive values to observe the
and natural language as input but only few of them have been eval- effect of each neuron. The Gradient-based explanation falls on
uated at the qualitative human experiment level. Perturbation or the Model Driven Explanation category that analyse internal compo-
Concept-based explanations are Data Driven Explanations that com- nents such as weights and neurons to generate explanation [110].
pletely relies on input data for generating explanation. As only Besides the above two methodologies, some global explainable

176
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

methods are based on the defined rules [91], concepts [97], and 3. Some existing approaches such as surrogate models (LIME)
clusters [72] approach. The major drawback of most of the dis- approximate the black box model to explain the decisions.
cussed global explanation methods is the dearth of human subject These interpretation models may be close to the black box
experiments. Arguments, reasoning, and explanations are more model for one subset of a dataset but diverge widely for other
effective if they help the end-user to build a true picture of the subsets.
entire model process. There are two ways to keep humans in a loop 4. The existing global interpretation methods have been used to
while evaluating the explainable methods. First, a random selec- explain DNNs for the data types such as images and tabular,
tion of lay people (without technical or domain knowledge) inter- there is no global interpretation method for text datasets that
acting with explanation tools and providing their responses/ can illuminate the decision rationale executed by DNNs.
feedback through the designed questionnaire by AI experts. Sec-
ondly, domain experts providing their opinions on the explanation While deliberating the existing research gaps in the state of the
tools and using domain knowledge to verify their consistency. Dif- art of global XAI methods, we have a significant opportunity to dis-
ferent free libraries and frameworks such as Python, R, Caffe, cuss and establish some future research goals and directions for
Pytorch, Keras and TensorFlow are available to generate textual, academic researchers.
visual, audio, and video explanations depending on the type of
dataset and the demand of the end-user.  The DNNs have model-free architecture and the existing global
interpretation methods produced an explanation by approxi-
mating the black box model of DNNs. To make model-
6. Conclusion and Open Challenges with Future Directions oriented architecture one needs to introduce a mathematical
model that should explain each decision of the black box model
It can be observed that in the past decade, many researchers of DNN deterministically and represent and govern the learning
primarily paid attention to the development of global interpreta- evolution happening in each iteration.
tion methods, even though they utilised many existing local or  The existing approaches are expensive in terms of computation
visual explanation methods for the complete model interpretation complexity because of the stochastic behaviour and perfor-
[111,112]. Although with sufficient accuracy, the local interpreta- mance of many DNNs models. The deterministic explanation
tion methods of AI can improve the user’s trust, these methods approach would explain the model with low computational cost
never reveal the complete structure of the AI model. Therefore, and make the interpretation more accurate no matter how
it is considered as the biggest drawback of the local interpretation many times we execute this model.
methods. This paper presents a brief history of global XAI meth-
ods from the mid of 20th century to 21th century, then a taxonomy The proposed future directions demand developing novel deter-
of the global interpretation AI methods produced in the past ten ministic models that can highlight the influencing features and
years is presented. This, also provides answers to questions such mathematically figure out the contribution of each part in the
as why the complete explainability and interpretability of AI decision-making process. Consequently, the proposed determinis-
models are so important and how the vague understanding of tic explainable model will reduce the computational cost as it
AI models and relating technologies would affect the human life? can provide desirable outcomes in fewer iterations. Moreover,
There is an inadequate illustration of some terminologies that are the DNNs will become more trustworthy and reliable for risky
commonly but interchangeably used in the XAI field. The expla- applications as we would have a controlled learning process within
nation of these terminologies in this paper will help the readers DNNs. In the end, it is worth mentioning that interpretation meth-
in understanding XAI methods. This survey will provide a detailed ods should be built under some constraints such as data privacy
insight into the recent developments on global XAI methods, and model confidentiality because explainability may lead to
existing challenges, and the possible path towards trustworthy revealing some sensitive information about the model unless the
XAI methods that would be understandable by a human. The cen- experimentation and execution is carried out in a protected and
tral focus of this survey paper is around answering the question: compliant environment.
how XAI methods can be completely explained for their structure
and decision routes? There is considerable work done in the past
ten years on the global interpretation methods that have been CRediT authorship contribution statement
highlighted and summarized in this paper and this area of
research has been continuously evolved by introducing new and Rabia Saleem: Conceptualization, Investigation, Writing – orig-
novel approaches. The key findings can be summarized as inal draft, Writing – review & editing. Bo Yuan: Supervision, Vali-
follows: dation, Writing – review & editing. Fatih Kurugollu: Supervision,
Validation, Writing – review & editing. Ashiq Anjum: Supervision,
1. Mostly the existing interpretation methods explain the Validation, Visualization, Writing – review & editing. Lu Liu: Vali-
decision-making process of the DNNs by using local or visual dation, Visualization, Writing – review & editing.
approaches. However, these approaches are inadequate to
explain the full architecture of the DNNs as the local methods
generate an explanation just by following the decision route Declaration of Competing Interest
for one single instance at a time. While knowing the rationale
for all possible outcomes, the global interpretation methods The authors declare that they have no known competing finan-
can explain the complete architecture of DNNs. cial interests or personal relationships that could have appeared
2. Existing global interpretation methods such as Global Attribu- to influence the work reported in this paper.
tion Analysis (GAA) explain the model at the global level by
using some existing local interpretation methods such as LIME,
integrated gradient, and DeepLIFT. These methods become Acknowledgements
computationally expensive as the number of features and
parameters are quite high in the deep architectures of neural This work was carried out in the Data Science Research Center
networks. (DSRC) at the University of Derby.
177
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

References expectation, journal of Computational and Graphical Statistics 24 (1) (2015)


44–65.
[28] J.H. Friedman, Greedy function approximation: a gradient boosting machine,
[1] L. Lin, K. Wang, W. Zuo, M. Wang, J. Luo, L. Zhang, A deep structured model
Annals of statistics (2001) 1189–1232.
with radius-margin bound for 3d human activity recognition, International
[29] D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, K.-R. Müller,
Journal of Computer Vision 118 (2015) 256–273.
How to explain individual classification decisions, The, Journal of Machine
[2] N.D. Doulamis, A. Voulodimos, Fast-mdl: Fast adaptive supervised training of
Learning Research 11 (2010) 1803–1831.
multi-layered deep learning models for consistent object tracking and
[30] A.-M. Leventi-Peetz, T. Östreich, Deep learning reproducibility and
classification, in: 2016 IEEE International Conference on Imaging Systems
explainable ai (xai), arXiv preprint arXiv:2202.11452 (2022).
and Techniques (IST), 2016, pp. 318–323.
[31] K. Weitz, T. Hassan, U. Schmid, J.-U. Garbas, Deep-learned faces of pain and
[3] N. Zeng, P. Wu, Z. Wang, H. Li, W. Liu, X. Liu, A small-sized object detection
emotions: Elucidating the differences of facial expressions with the help of
oriented multi-scale feature fusion approach with application to defect
explainable ai methods, tm-Technisches Messen 86 (7–8) (2019) 404–412.
detection, IEEE Transactions on Instrumentation and Measurement 71 (2022)
[32] M.M. De Graaf, B.F. Malle, How people explain action (and autonomous
1–14, https://doi.org/10.1109/TIM.2022.3153997.
intelligent systems should too), in: 2017 AAAI Fall Symposium Series, 2017.
[4] X. Chen, B. Zhang, D. Gao, Bearing fault diagnosis base on multi-scale cnn and
[33] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A
lstm model, Journal of Intelligent Manufacturing 32 (4) (2021) 971–987.
survey of methods for explaining black box models, ACM computing surveys
[5] C. You, J. Lu, D. Filev, P. Tsiotras, Advanced planning for autonomous vehicles
(CSUR) 51 (5) (2018) 1–42.
using reinforcement learning and deep inverse reinforcement learning,
[34] M.T. Ribeiro, S. Singh, C. Guestrin, Model-agnostic interpretability of machine
Robotics and Autonomous Systems 114 (2019) 1–18.
learning, arXiv preprint arXiv:1606.05386 (2016).
[6] S. Grigorescu, B. Trasnea, T. Cocias, G. Macesanu, A survey of deep learning
[35] Z.C. Lipton, The mythos of model interpretability: In machine learning, the
techniques for autonomous driving, Journal of Field Robotics 37 (3) (2020)
concept of interpretability is both important and slippery, Queue 16 (3)
362–386.
(2018) 31–57.
[7] A. Boles, P. Rad, Voice biometrics: Deep learning-based voiceprint
[36] X. Huang, D. Kroening, W. Ruan, J. Sharp, Y. Sun, E. Thamo, M. Wu, X. Yi, A
authentication system, in: 2017 12th System of Systems Engineering
survey of safety and trustworthiness of deep neural networks: Verification,
Conference (SoSE), IEEE, 2017, pp. 1–6.
testing, adversarial attack and defence, and interpretability, Computer
[8] D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W.
Science Review 37 (2020) 100270.
Wiesbeck, K. Dietmayer, Deep multi-modal object detection and semantic
[37] A.B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S.
segmentation for autonomous driving: Datasets, methods, and challenges,
García, S. Gil-López, D. Molina, R. Benjamins, et al., Explainable artificial
IEEE Transactions on Intelligent Transportation Systems 22 (3) (2020) 1341–
intelligence (xai): Concepts, taxonomies, opportunities and challenges
1360.
toward responsible ai, Information fusion 58 (2020) 82–115.
[9] N.M. Rad, S.M. Kia, C. Zarbo, T. van Laarhoven, G. Jurman, P. Venuti, E.
[38] Z. Lipton, The mythos of model interpretability, Queue 16 (3) (2018), 30: 31–
Marchiori, C. Furlanello, Deep learning for automatic stereotypical motor
30: 57.
movement detection using wearable sensors in autism spectrum disorders,
[39] S.J. Oh, B. Schiele, M. Fritz, Towards reverse-engineering black-box neural
Signal Processing 144 (2018) 180–191.
networks, in: Explainable AI: Interpreting, Explaining and Visualizing Deep
[10] A.S. Heinsfeld, A.R. Franco, R.C. Craddock, A. Buchweitz, F. Meneguzzi,
Learning, Springer, 2019, pp. 121–144.
Identification of autism spectrum disorder using deep learning and the
[40] M.T. Ribeiro, S. Singh, C. Guestrin, , why should i trust you? explaining the
abide dataset, NeuroImage: Clinical 17 (2018) 16–23.
predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD
[11] S.H. Silva, A. Alaeddini, P. Najafirad, Temporal graph traversals using
international conference on knowledge discovery and data mining, 2016, pp.
reinforcement learning with proximal policy optimization, IEEE Access 8
1135–1144.
(2020) 63910–63922.
[41] A. Das, P. Rad, Opportunities and challenges in explainable artificial
[12] Z. Wan, R. Yang, M. Huang, W. Liu, N. Zeng, Eeg fading data classification
intelligence (xai): A survey, arXiv preprint arXiv:2006.11371 (2020).
based on improved manifold learning with adaptive neighborhood selection,
[42] D. Erhan, A. Courville, Y. Bengio, Understanding representations learned in
Neurocomputing 482 (2022) 186–196.
deep architectures, Tech. rep., Technical Report 1355, Université de Montréal/
[13] Z. Wan, R. Yang, M. Huang, N. Zeng, X. Liu, A review on transfer learning in eeg
DIRO (2010).
signal analysis, Neurocomputing 421 (2021) 1–14.
[43] K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks:
[14] R. Sayres, A. Taly, E. Rahimy, K. Blumer, D. Coz, N. Hammel, J. Krause, A.
Visualising image classification models and saliency maps, arXiv preprint
Narayanaswamy, Z. Rastegar, D. Wu, et al., Using a deep learning algorithm
arXiv:1312.6034 (2013).
and integrated gradients explanation to assist grading for diabetic
[44] D. Smilkov, N. Thorat, B. Kim, F. Viégas, M. Wattenberg, Smoothgrad:
retinopathy, Ophthalmology 126 (4) (2019) 552–564.
removing noise by adding noise, arXiv preprint arXiv:1706.03825 (2017).
[15] A. Das, P. Rad, K.-K.R. Choo, B. Nouhi, J. Lish, J. Martel, Distributed machine
[45] S.-K. Yeom, P. Seegerer, S. Lapuschkin, A. Binder, S. Wiedemann, K.-R. Müller,
learning cloud teleophthalmology iot for predicting amd disease progression,
W. Samek, Pruning by explaining: A novel criterion for deep neural network
Future Generation Computer Systems 93 (2019) 486–498.
pruning, Pattern Recognition 115 (2021) 107899.
[16] J. Son, J.Y. Shin, H.D. Kim, K.-H. Jung, K.H. Park, S.J. Park, Development and
[46] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features
validation of deep learning models for screening multiple abnormal findings
for discriminative localization, in: Proceedings of the IEEE conference on
in retinal fundus images, Ophthalmology 127 (1) (2020) 85–94.
computer vision and pattern recognition, 2016, pp. 2921–2929.
[17] G.D.L.T. Parra, P. Rad, K.-K.R. Choo, N. Beebe, Detecting internet of things
[47] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-
attacks using distributed deep learning, Journal of Network and Computer
cam: Visual explanations from deep networks via gradient-based
Applications 163 (2020) 102662.
localization, in: Proceedings of the IEEE international conference on
[18] A.D. Torres, H. Yan, A.H. Aboutalebi, A. Das, L. Duan, P. Rad, Patient facial
computer vision, 2017, pp. 618–626.
emotion recognition and sentiment analysis using secure cloud with
[48] A. Miller, J. Panneerselvam, L. Liu, A review of regression and classification
hardware acceleration, in: Computational Intelligence for Multimedia Big
techniques for analysis of common and rare variants and gene-environmental
Data on the Cloud with Engineering Applications, Elsevier, 2018, pp. 61–89.
factors, Neurocomputing (2021).
[19] S.M. Lee, J.B. Seo, J. Yun, Y.-H. Cho, J. Vogel-Claussen, M.L. Schiebler, W.B.
[49] S. Mishra, B.L. Sturm, S. Dixon, Local interpretable model-agnostic
Gefter, E.J. Van Beek, J.M. Goo, K.S. Lee, et al., Deep learning applications in
explanations for music content analysis., in: ISMIR, Vol. 53, 2017, pp. 537–
chest radiography and computed tomography, Journal of thoracic imaging 34
543.
(2) (2019) 75–85.
[50] T. Peltola, Local interpretable model-agnostic explanations of bayesian
[20] R. Chen, L. Yang, S. Goodison, Y. Sun, Deep-learning approach to identifying
predictive models via kullback-leibler projections, arXiv preprint
cancer subtypes using high-dimensional genomic data, Bioinformatics 36 (5)
arXiv:1810.02678 (2018).
(2020) 1476–1483.
[51] S. Shi, X. Zhang, W. Fan, A modified perturbed sampling method for local
[21] D. Castelvecchi, Can we open the black box of ai?, Nature News 538 (7623)
interpretable model-agnostic explanation, arXiv preprint arXiv:2002.07434
(2016) 20
(2020).
[22] D. Dave, H. Naik, S. Singhal, P. Patel, Explainable ai meets healthcare: A study
[52] S.M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions,
on heart disease dataset, 2020, arXiv preprint arXiv:2011.03195.
Advances in neural information processing systems 30 (2017).
[23] M. Nauta, J. Trienes, S. Pathak, E. Nguyen, M. Peters, Y. Schmitt, J. Schlötterer,
[53] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in:
M. van Keulen, C. Seifert, From anecdotal evidence to quantitative evaluation
International conference on machine learning, PMLR, 2017, pp. 3319–3328.
methods: A systematic review on evaluating explainable ai, arXiv preprint
[54] S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, K.-R. Müller,
arXiv:2201.08164 (2022).
Unmasking clever hans predictors and assessing what machines really learn,
[24] H.S. Kapoor, K. Jain, S.K. Sharma, Generalized additive model for evaluation of
Nature communications 10 (1) (2019) 1–8.
premium for diabetic patients, Journal of Advances in Applied Mathematics 1
[55] C. Seifert, A. Aamir, A. Balagopalan, D. Jain, A. Sharma, S. Grottel, S. Gumhold,
(3) (2016).
Visualizations of deep neural networks in computer vision: A survey, in:
[25] M. Craven, J. Shavlik, Extracting tree-structured representations of trained
Transparent data mining for big and small data, Springer, 2017, pp. 123–144.
networks, Advances in neural information processing systems 8 (1995).
[56] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, H. Lipson, Understanding neural
[26] T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning 2nd
networks through deep visualization, arXiv preprint arXiv:1506.06579
ed springer series in statistics (2009).
(2015).
[27] A. Goldstein, A. Kapelner, J. Bleich, E. Pitkin, Peeking inside the black box:
[57] M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks,
Visualizing statistical learning with plots of individual conditional
in: European conference on computer vision, Springer, 2014, pp. 818–833.

178
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

[58] J.T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for [87] G.A. Klein, Do decision biases explain too much, Human Factors Society
simplicity: The all convolutional net, arXiv preprint arXiv:1412.6806 (2014). Bulletin 32 (5) (1989) 1–3.
[59] A. Mahendran, A. Vedaldi, Understanding deep image representations by [88] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, N. Elhadad, Intelligible models
inverting them, in: Proceedings of the IEEE conference on computer vision for healthcare: Predicting pneumonia risk and hospital 30-day readmission,
and pattern recognition, 2015, pp. 5188–5196. in: Proceedings of the 21th ACM SIGKDD international conference on
[60] A. Mahendran, A. Vedaldi, Visualizing deep convolutional neural networks knowledge discovery and data mining, 2015, pp. 1721–1730.
using natural pre-images, International Journal of Computer Vision 120 (3) [89] C. Xiao, T. Ma, A.B. Dieng, D.M. Blei, F. Wang, Readmission prediction via deep
(2016) 233–255. contextual embedding of clinical concepts, PloS one 13 (4) (2018) e0195024.
[61] A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, J. Clune, Synthesizing the [90] Y. Ming, H. Qu, E. Bertini, Rulematrix: Visualizing and understanding
preferred inputs for neurons in neural networks via deep generator networks, classifiers with rules, IEEE transactions on visualization and computer
Advances in neural information processing systems 29 (2016). graphics 25 (1) (2018) 342–352.
[62] G. Hooker, Discovering additive structure in black box functions, in: [91] B. Letham, C. Rudin, T.H. McCormick, D. Madigan, Interpretable classifiers
Proceedings of the tenth ACM SIGKDD international conference on using rules and bayesian analysis: Building a better stroke prediction model,
Knowledge discovery and data mining, 2004, pp. 575–580. The Annals of Applied Statistics 9 (3) (2015) 1350–1371.
[63] J.J. Thiagarajan, B. Kailkhura, P. Sattigeri, K.N. Ramamurthy, Treeview: [92] H. Yang, C. Rudin, M. Seltzer, Scalable bayesian rule lists, in: International
Peeking into deep neural networks via feature-space partitioning, arXiv conference on machine learning, PMLR, 2017, pp. 3921–3930.
preprint arXiv:1611.07429 (2016). [93] R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, R. Caruana, G.E.
[64] A. Datta, S. Sen, Y. Zick, Algorithmic transparency via quantitative input Hinton, Neural additive models: Interpretable machine learning with neural
influence: Theory and experiments with learning systems, in: 2016 IEEE nets, Advances in Neural Information Processing Systems 34 (2021).
symposium on security and privacy (SP), IEEE, 2016, pp. 598–617. [94] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, On
[65] J.D. Olden, D.A. Jackson, Illuminating the black box: a randomization pixel-wise explanations for non-linear classifier decisions by layer-wise
approach for understanding variable contributions in artificial neural relevance propagation, PloS one 10 (7) (2015) e0130140.
networks, Ecological modelling 154 (1–2) (2002) 135–150. [95] S.M. Lundberg, G. Erion, H. Chen, A. DeGrave, J.M. Prutkin, B. Nair, R. Katz, J.
[66] P. Cortez, J. Teixeira, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Using data Himmelfarb, N. Bansal, S.-I. Lee, Explainable ai for trees: From local
mining for wine quality assessment, in: International Conference on explanations to global understanding, arXiv preprint arXiv:1905.04610
Discovery Science, Springer, 2009, pp. 66–79. (2019).
[67] L. Breimann, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and [96] M. Ibrahim, M. Louie, C. Modarres, J. Paisley, Global explanations of neural
regression trees, Wadsworth, Pacific Grove, 1984. networks: Mapping the landscape of predictions, in: Proceedings of the 2019
[68] S. Hara, K. Hayashi, Making tree ensembles interpretable, arXiv preprint AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 279–287.
arXiv:1606.05390 (2016). [97] P.H. Lee, L. Philip, Distance-based tree models for ranking data,
[69] A.D. Arbatli, H.L. Akin, Rule extraction from trained neural networks using Computational Statistics & Data Analysis 54 (6) (2010) 1672–1682.
genetic algorithms, Nonlinear Analysis: Theory, Methods & Applications 30 [98] G.S. Shieh, Z. Bai, W.-Y. Tsai, Rank tests for independence–with a weighted
(3) (1997) 1639–1648. contamination alternative, Statistica Sinica (2000) 577–593.
[70] Z.-H. Zhou, Y. Jiang, S.-F. Chen, Extracting symbolic rules from trained neural [99] H.-S. Park, C.-H. Jun, A simple and fast algorithm for k-medoids clustering,
network ensembles, Ai Communications 16 (1) (2003) 3–15. Expert systems with applications 36 (2) (2009) 3336–3341.
[71] Y. Lou, R. Caruana, J. Gehrke, Intelligible models for classification and [100] A. Ghorbani, J. Wexler, J.Y. Zou, B. Kim, Towards automatic concept-based
regression, in: Proceedings of the 18th ACM SIGKDD international conference explanations, Advances in Neural Information Processing Systems 32 (2019).
on Knowledge discovery and data mining, 2012, pp. 150–158. [101] R. Zhang, P. Isola, A.A. Efros, E. Shechtman, O. Wang, The unreasonable
[72] Y. Lou, R. Caruana, J. Gehrke, G. Hooker, Accurate intelligible models with effectiveness of deep features as a perceptual metric, in: Proceedings of the
pairwise interactions, in: Proceedings of the 19th ACM SIGKDD international IEEE conference on computer vision and pattern recognition, 2018, pp. 586–
conference on Knowledge discovery and data mining, 2013, pp. 623–631. 595.
[73] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, N. Elhadad, Intelligible models [102] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, et al.,
for healthcare: Predicting pneumonia risk and hospital 30-day readmission, Interpretability beyond feature attribution: Quantitative testing with
in: Proceedings of the 21th ACM SIGKDD international conference on concept activation vectors (tcav), in: International conference on machine
knowledge discovery and data mining, 2015, pp. 1721–1730. learning, PMLR, 2018, pp. 2668–2677.
[74] A. Henelius, K. Puolamäki, H. Boström, L. Asker, P. Papapetrou, A peek into the [103] T.A. John, V.N. Balasubramanian, C.V. Jawahar, Canonical saliency maps:
black box: exploring classifiers by randomization, Data mining and Decoding deep face models, ArXiv abs/2105.01386 (2021).
knowledge discovery 28 (5) (2014) 1503–1529. [104] N. Ranjbar, R. Safabakhsh, Using decision tree as local interpretable model in
[75] A. Zien, N. Krämer, S. Sonnenburg, G. Rätsch, The feature importance ranking autoencoder-based lime, ArXiv abs/2204.03321 (2022).
measure, in: Joint European Conference on Machine Learning and Knowledge [105] J. Moosbauer, J. Herbinger, G. Casalicchio, M.T. Lindauer, B. Bischl, Explaining
Discovery in Databases, Springer, 2009, pp. 694–709. hyperparameter optimization via partial dependence plots, NeurIPS (2021).
[76] S. Sonnenburg, A. Zien, P. Philips, G. Rätsch, Poims: positional oligomer [106] D. Sisodia, D.S. Sisodia, Quad division prototype selection-based k-nearest
importance matrices–understanding support vector machine-based signal neighbor classifier for click fraud detection from highly skewed user click
detectors, Bioinformatics 24 (13) (2008) i6–i14. dataset, Engineering Science and Technology, an, International Journal
[77] M.M.-C. Vidovic, N. Görnitz, K.-R. Müller, M. Kloft, Feature importance (2021).
measure for non-linear learning algorithms, arXiv preprint arXiv:1611.07567 [107] G.K. Rajbahadur, S. Wang, Y. Kamei, A.E. Hassan, The impact of feature
(2016). importance methods on the interpretation of defect classifiers, ArXiv abs/
[78] T. Calders, S. Verwer, Three naive bayes approaches for discrimination-free 2202.02389 (2021).
classification, Data mining and knowledge discovery 21 (2) (2010) 277–292. [108] J. Park, J. Kim, A data-driven exploration of the race between human labor
[79] B. Ustun, C. Rudin, Supersparse linear integer models for optimized medical and machines in the 21<sup>st</sup> century, Commun. ACM 65 (5) (2022)
scoring systems, Machine Learning 102 (3) (2016) 349–391. 79–87, https://doi.org/10.1145/3488376, URL:https://doi.org/10.1145/
[80] D. Wei, S. Dash, T. Gao, O. Gunluk, Generalized linear rule models, 3488376.
International Conference on Machine Learning, PMLR (2019) 6687–6696. [109] Y. Liang, S. Li, C. Yan, M. Li, C. Jiang, Explaining the black-box model: A survey
[81] H. Elzayn, S. Jabbari, C. Jung, M. Kearns, S. Neel, A. Roth, Z. Schutzman, Fair of local interpretation methods for deep neural networks, Neurocomputing
algorithms for learning in allocation problems, in: Proceedings of the 419 (2021) 168–182.
Conference on Fairness, Accountability, and Transparency, 2019, pp. 170– [110] R. Wilming, C. Budding, K.-R. Müller, S. Haufe, Scrutinizing xai using linear
179. ground-truth data with suppressor variables, ArXiv abs/2111.07473 (2022).
[82] F. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, K.R. Varshney, [111] E. Wang, P. Khosravi, G.V. d. Broeck, Probabilistic sufficient explanations,
Optimized pre-processing for discrimination prevention, Advances in neural arXiv preprint arXiv:2105.10118 (2021).
information processing systems 30 (2017). [112] J. Gao, X. Wang, Y. Wang, Y. Yan, X. Xie, Learning groupwise explanations for
[83] M.B. Zafar, I. Valera, M. Rodriguez, K. Gummadi, A. Weller, From parity to black-box models, in: IJCAI, 2021.
preference-based notions of fairness in classification, Advances in Neural
Information Processing Systems 30 (2017).
[84] N. Grgić-Hlača, M.B. Zafar, K.P. Gummadi, A. Weller, Beyond distributive
fairness in algorithmic decision making: Feature selection for procedurally Rabia Saleem Obtained her Masters Degree from the
fair learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, University of Engineering and Technology, Lahore, Pak-
Vol. 32, 2018. istan. Ms. Rabia currently is a doctoral candidate at the
[85] M. Kearns, S. Neel, A. Roth, Z.S. Wu, Preventing fairness gerrymandering: University of Derby, UK. Her research is centered on
Auditing and learning for subgroup fairness, in: International Conference on Explainable Artificial Intelligence (XAI). Her research
Machine Learning, PMLR, 2018, pp. 2564–2572. interests include Deep neural networks (DNNs), and
[86] B. Kim, C. Rudin, J.A. Shah, The bayesian case model: A generative approach mathematical modelling of DNNs.
for case-based reasoning and prototype classification, Advances in neural
information processing systems 27 (2014).

179
R. Saleem, B. Yuan, F. Kurugollu et al. Neurocomputing 513 (2022) 165–180

Bo Yuan received the BEng and PhD degree in computer Ashiq Anjum is currently a Professor of distributed with
science from the Tongji University, Shanghai, China in the University of Leicester, Leicester, U.K. He was the
2011 and 2017, respectively. He is currently a Lecturer Director of Data Science Research Centre, University of
in Computer Science with the School of Computing and Derby, Derby, U.K. His research interests include data-
Mathematical Sciences, University of Leicester, UK. His intensive distributed systems and high-performance
research interests include Distributed Networks, Artifi- analytics platforms for continuous processing of
cial Intelligence, Internet of Things, Federated Learning, streaming data.
and Edge Computing. His Email is [email protected].
uk.

Fatih Kurugollu obtained BSc and MSc in Computer and


Control Engineering degree from Istanbul Technical
University, Turkey, in 1989 and 1994, respectively. He Lu Liu is a Professor and Head of School of Computing
was awarded with a PhD degree in Computer Engi- and Mathematical Sciences at the University of Leices-
neering from the same university in 2000. He joined ter, UK. Prof. Liu received his Ph.D. degree from the
University of Derby, UK, as a Professor of Cyber Security University of Surrey and M.Sc. degree from Brunel
in 2016. He has recently been appointed as a full Pro- University. Prof. Liu’s research interests are in the areas
fessor at University of Sharjah, UAE.His current research of data analytics, service computing, cloud computing,
interests are centred around Security and Privacy in Artificial Intelligence and the Internet of Things. He is a
Internet-of-Things, Cloud Security, Imaging for Foren- Fellow of British Computer Society (BCS).
sics and Security, Security related Multimedia Content
Analysis, Big Data in Cyber Security, Homeland Security,
Security Issues in Healthcare Systems, Biometrics,
Image and Video Analysis.

180

You might also like