Full Text 01
Full Text 01
TREES TO DETECT
ALZHEIMER’S DISEASE
Janis Pettersson
2021
Abstract
Alzheimer’s Disease is the most common disease of dementia, which may in-
volve the decline of memory, communication, and judgement, yet it is hard to
diagnose. The machine learning algorithms, Decision Trees, provide a possible
solution to reduce the difficulty of the diagnosis process of Alzheimer’s Dis-
ease. This thesis studied the use of Decision Trees to detect Alzheimer’s Disease
by investigating 27 different Decision Tree models derived from applying nine
datasets, concerning Alzheimer’s Disease, on three Decision Tree algorithms.
The datasets utilized were the OASIS: Cross-Sectional, OASIS: Longitudinal,
and OASIS-3 datasets, each of which had three variants; Unmoderated that con-
tained missing values, No Missing Values that contained no missing values, and
Reduced Attributes that contained no missing values and no attributes of low im-
portance. The algorithms utilized were the C4.5, CART, and CHAID Decision
Tree algorithms. The results showed that the C4.5 algorithm and the OASIS-
2 datasets performed worse than their alternatives. Moreover, it showed that
the CART algorithm, the Unmoderated OASIS-1 dataset, and the No Missing
Values and Reduced Attributes OASIS-3 datasets performed better than their
alternatives. The results also reveal that the worst Decision Tree model ob-
tained in the experiment had a prediction accuracy of 76.94%, 55 number of
nodes, and a depth 10. In contrast, the best model had a prediction accuracy
of 90.37%, 5 number of nodes, and a depth 2. The results suggest that Decision
Trees are suitable models to detect Alzheimer’s Disease.
i
Acknowledgments
The author gratefully acknowledges the data provided by the following:
• OASIS: Cross-Sectional: Principal Investigators: D. Marcus, R. Buckner,
J. Csernansky, J. Morris; P50 AG05681, P01 AG03991, P01 AG026276,
R01 AG021910, P20 MH071616, U24 RR021382.
ii
Contents
Abstract i
Acknowledgments ii
Contents iii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Related Works 2
3 Theoretical Background 4
4 Method 6
4.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Justification of Methodology . . . . . . . . . . . . . . . . . . . . . 10
5 Results 11
6 Analysis 13
6.1 Weak Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Strong Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Existing Literature . . . . . . . . . . . . . . . . . . . . . . . . . . 16
7 Discussion 17
7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 References 18
iii
1 Introduction
1.1 Background
Dementia describes a symptom of disease that deteriorates several higher cor-
tical functions (memory, communication, judgement, etc.) (World Health Or-
ganization, 2016). Alzheimer’s Disease (AD) is the most common disease of
dementia. According to Alzheimer’s Association, 60 to 80% of all dementia
cases is AD. It may be difficult to detect AD, especially in the early stages or if
the symptoms are mild. In a few cases, the use of expensive equipment or the
conduction of numerous test might be required to diagnose a patient. Inspired
by the current progress of machine learning, this thesis studies the use of the
machine learning algorithms, Decision Trees, to detect AD.
In the field of machine learning, there is a consensus that deep learning mod-
els produce higher prediction accuracy compared to the other machine learning
models. Despite having lower prediction accuracy, there are still applications
in which non-deep learning models and algorithms are preferable. The qualities
that make Decision Trees desirable are simplicity and interpretability, meaning
that they are easily understood by humans. This model is especially benefi-
cial for systems in the medical domain because, in most cases, if an error is
detected, it is important to understand the cause of it in order to correct it.
Because the models are understandable to humans, systems that utilize Deci-
sion Trees allow the process of removing errors to be done without the struggle
of pinpointing where the issue occurred. This provides motivation for further
research on Decision Trees and its potential in various applications.
The goal of this thesis is to compare the prediction accuracy and size of the
different Decision Trees attained from nine datasets pertaining to AD, each
of which were applied on three Decision Tree algorithms. The purpose is to
gain insight on how different datasets, concerning AD, and algorithms influ-
ence prediction accuracy and the size of the Decision Trees they construct. The
knowledge could be used to help choose the appropriate dataset and/or Deci-
sion Tree algorithm, maybe even beyond this context. Hopefully, this study can
1
inspire more research in the use of machine learning algorithms, in the medical
domain, to detect hard-to-diagnose diseases.
1.3 Approach
Through this thesis, an experiment was conducted by applying different datasets,
pertaining to AD, on different Decision Tree algorithms to construct different
Decision Tree models. The number of nodes, the depth, and the prediction ac-
curacy of these models were compared. The different datasets were derived from
the Open Access Series of Imaging Studies (OASIS) database1 ; OASIS: Cross-
Sectional (Marcus et al., 2007), OASIS: Longitudinal (Marcus et al., 2010), and
OASIS-3 (LaMontagne et al., 2019). In this paper, OASIS: Cross-Sectional is
referred to as OASIS-1 and OASIS-Longitudinal is referred to as OASIS-2. The
datasets were in tabular form and contained data samples, with nominal and
numerical values. Three variants were created from each of these datasets, pro-
ducing a total of nine datasets, all of which were utilized in the experiment. The
Decision Tree algorithms used were the C4.5 (Quinlan, 1993), CART (Breiman
et al., 1984), and CHAID (Kass, 1980) algorithm. The 10-fold cross validation
method was invoked during the training phase to minimize the risk of the model
overfitting the training dataset. The Decision Tree that does not overfit the data
and has a high prediction accuracy, low amount of nodes, and short depth, is
the most favourable model. The experiment was executed using the machine
learning software, Weka 3.8.52 (Eibe et al., 2016).
2 Related Works
A meta study conducted by Miah et al. (2021) compared the performance of
different machine learning algorithms for detecting diseases of dementia through
the results from existing literature and concluded that diseases of dementia can
be detected in its early stages. The algorithms included Support Vector Ma-
chine, Logistic Regression, Artificial Neural Network, Naive Bayes, Decision
Tree, Random Forest, and k-Nearest Neighbours. The results showed that Sup-
port Vector Machine and Random Forest performed best. Among the literature
studied, two of them used Decision Trees in their comparative research; Bansal
et al. (2018) and Farhan et al. (2014). These two studies are described more
below.
1 https://www.oasis-brains.org/
2 https://www.cs.waikato.ac.nz/ml/weka/
2
In a study conducted by Bansal et al. (2018), an experimental study was per-
formed to compare the prediction accuracy of two datasets applied on different
machine learning algorithms before and after the application of an attribute
reduction method. The algorithms were Decision Tree, Naive Bayes, Random
Forest, and Multilayer Perceptron. The attribute reduction method used was
CFS Subset Evaluator. The two datasets used are from the OASIS database,
OASIS-1 and OASIS-2. The results showed that the C4.5 Decision Tree al-
gorithm implementation called J48 produced the most favourable results by
generating a generally high prediction accuracy. The lowest prediction accuracy
was on the OASIS-2 dataset after the application of the attribute reduction
method (98.66%), and the highest was on the OASIS-1 dataset before and after
the attribute reduction method (99.52%). The algorithm and reduction method
utilized was provided by the Weka machine learning software.
The idea of using machine learning models to detect AD using is not new nor
foreign. To my knowledge, however, the use of Decision Trees to detect AD has
not been thoroughly studied. Decision trees were used for AD detection either
(1.) to compare one Decision Tree algorithm, the C4.5 algorithm, with other
machine learning algorithms or (2.) to propose a new method that involved
Decision Trees and compare that proposed method with one of the traditional
Decision Tree algorithms. Therefore, to my knowledge, nobody has investi-
gated how datasets and Decision Tree algorithms influence the structure of the
Decision Trees and their ability to classify data pertaining to AD.
3
3 Theoretical Background
Decision Trees are interpretable machine learning models commonly used for
predicting data. These models are data structures built upon a parent-child
hierarchy through nodes and edges, i.e. a tree. A parent node represents a
feature or condition. Branches that link together a parent node to different
child nodes, represents choices. A child node represents a decision, and it can
also be a parent node if it is non-terminal. If a node does not have children
nodes, it is considered terminal. Nodes of a Decision Trees are able to have two
or more branches. Because of these characteristics, the structure of a Decision
Tree resembles an upside-down tree and is able to be visualized. The two types
of Decision Trees are classification and regression. Classification Decision Trees
refer to models in which the possible outcomes is a discrete set. Regression
refers to possible outcomes that is a continuous set. Figure 1 illustrates an ex-
ample of a classification Decision Tree used to decide whether or not to bring
an umbrella.
Figure 1: A binary Decision Tree used to decide whether or not to bring your
umbrella.
4
Gain, Gini Diversity Index, Chi-Square, etc. The purpose of these criteria is
to determine the condition that best splits a dataset at a node. The condition
refers to the dataset attributes.
Decision Tree models are able to be evaluated through their prediction accuracy,
number of nodes, and depth. The prediction accuracy is the percentage correct
predictions the model has made against a test dataset. The number of nodes is
the total amount of terminal and non-terminal nodes, i.e. the total amount of
conditions and decisions. The depth is the longest path from the root node to
the leaf node. The root node is the starting node which presents the first condi-
tion. The leaf nodes are the nodes that represents a possible outcome. From the
example in Figure 1, the root node is the node that presents the question "Is it
convenient to bring your umbrella? " and the leaf nodes present the outcomes:
"Take your umbrella" and "Leave you umbrella".
The C4.5, CART, and CHAID algorithms are among the traditional and well-
established Decision Tree algorithms that currently exist today. These algo-
rithms adopt a top-down, greedy approach to construct a Decision Tree. Because
their method is of greedy nature, they are unable to create globally optimal De-
cision Trees. However, they remain prominent despite that obtaining an optimal
Decision Tree is considered an NP-complete problem (Laurent and Rivest, 1976).
The C4.5 algorithm produces n-ary splits. The splitting criterion of the algo-
rithm is based on Gain Ratio (Quinlan, 1993), which uses Entropy to measure
the purity of an attribute. When splitting at a node, the attribute that pro-
duced the highest Gain Ratio is chosen.
CART is an acronym for Classification and Regression Trees. The CART algo-
rithm produces binary splits and its splitting criterion is based on Gini Diversity
Index (Breiman et al., 1984), which is used to measure the impurity of an at-
tribute. When splitting at a node, the attribute that produced the lowest Gini
Index value is chosen.
It is possible Decision Tree models overfit the training dataset. This means
that the model performs accurately on the training data but not on new data.
In other words, the model does not generalize well and only makes accurate
predictions on the training dataset. There are methods that can be deployed
to avoid this. Attribute reduction, pruning, and k-fold cross-validation are a
few of these methods. Attribute reduction involves the removal of attributes
and pruning involves the removal of nodes and branches, i.e. subtrees. The
respective methods remove elements deemed to bring little to no contribution
on creating an accurate model that generalizes well. The k-fold cross-validation
5
method involves training the model on different datasets and evaluating the
model on different test datasets. This is accomplished by dividing a dataset
into k amount of subsets. Thereafter, an iteration process occurs, wherein each
of the subsets are able to act as a test dataset while the remaining data acts as
the training dataset.
4 Method
4.1 Algorithm Description
The Decision Tree algorithms used in the experiment of this thesis were C4.5,
CART, and CHAID. These algorithms were chosen because they are popular
and well-established. Among these algorithms, C4.5 is particularly popular in
recent studies that propose new approaches of obtaining a Decision Tree. It
frequently acts as a benchmark for studying the performance of the Decision
Tree models it constructs. The implementation of these algorithms were pro-
vided by the open-sourced machine learning software Weka, version 3.8.5. The
C4.5, CART, and CHAID implementation used on this software was J48, Sim-
pleCART, JCHAIDStar, respectively. To be specific, JCHAIDStar implemented
a modified version of CHAID, which was able to handle continuous attributes
(Ibarguren et al., 2016). All three implementations were able to handle missing
values.
The parameters of the implementations were set to Weka 3.8.5’s default values.
The J48 and JCHAIDStar implementations used the post-pruning method Sub-
tree Raising to prune the Decision Trees they constructed. For both implemen-
tations, the confidence factor used for pruning was set to 0.25. The SimpleCART
implementation used the Minimum Cost-Complexity post-pruning algorithm to
prune its constructed CART Decision Trees. This pruning algorithm utilized
the 5-fold internal cross-validation method to determine the cost-complexity pa-
rameter.
The 10-fold cross validation method is commonly used to build more robust
and accurate machine learning models by minimizing the risk of the model
overfitting the training dataset. Therefore, in the experiment of this thesis, this
method was invoked during the training phase.
6
OASIS-1, OASIS-2, and OASIS-3 were tabular datasets that stored both nom-
inal and numerical data. The datasets contained references to brain scanning
images and the demographic, clinical, and derived brain anatomical information
of the participants. The list of attributes utilized by the respective datasets are
demonstrated in Table 2 and the descriptions of these attributes, used to de-
scribe a participant, are presented in Table 3. The use of brain scanning images
was excluded from this study to focus on textual classification rather than image
classification.
Datasets
Characteristics OASIS-1 OASIS-2 OASIS-3
Participants 416 150 1098
Type of Study Cross-Sectional Longitudinal Longitudinal
Age Range 18 to 96 60 to 98 42 to 97
Age Range - with AD 62 to 96 61 to 98 49 to 97
Age Range - no AD 18 to 94 60 to 97 42 to 97
The OASIS database contained additional attributes that are able to extend
the datasets with supplementary information regarding a participant. However,
a trade-off was made between quantity and quality. Therefore, the attributes
chosen for each dataset were limited to an amount that is not overwhelming but
still comprised of demographic, clinical, and derived brain anatomical informa-
tion.
7
Table 3: Descriptions of the attributes from the OASIS datasets, used in the
experiment. Each attribute represents a feature of the participant it describes.
Attribute Description
M/F The sex: male or female.
8
The Clinical Dementia Rating (CDR) was used to determine the stage of AD of
a participant. Therefore, for each dataset, the CDR attribute was used to clas-
sify the data samples into one of two groups. These groups were "AD" and "No
AD", which depict AD as being present or absent, respectively. Data samples
with a CDR value of 0 are members of the "No AD" category. Data samples
with a CDR value of more than 0 are members of the "AD" category.
Three variants of each dataset were used in the experiment. The first vari-
ant was the Unmoderated dataset. These are denoted as OASIS-1/2/3-U.
These datasets preserved the NULL or missing values, which the OASIS-1,
OASIS-2, and OASIS-3 datasets had. The second variant was the No Miss-
ing Values dataset, in which data samples that contained missing values were
removed. These datasets are denoted as OASIS-1/2/3-NMV. Table 4 illustrates
the amount of data samples in each OASIS dataset, before and after the removal
of missing values. The third variant is the Reduced Attributes dataset, in which
data samples that contained missing values and attributes that had low impor-
tance, were removed. These datasets are denoted as OASIS-1/2/3-RA. Table
5 illustrates the attributes utilized in each OASIS dataset after the removal of
attributes that had low importance.
Missing values in a data sample denote a loss of information for one or more
attributes. To avoid Decision Tree algorithms from interpreting the data incor-
rectly and corrupt the model, missing values should be handled. The imple-
mentation of the algorithms used in the experiment of this thesis are able to
handle missing values. Another way to handle missing values is to remove data
samples that contain them. To study how the different ways of handling miss-
ing values affect the models obtained from the OASIS datasets, both cases of
handling missing values were investigated through the formation of the Unmod-
erated and No Missing Values dataset variants. The formation of the Reduced
Attribute dataset variant, on the other hand, was motivated by the knowledge
that the removal of attributes with low importance minimizes the risk of the
model overfitting the training dataset.
Datasets
Variants OASIS-1 OASIS-2 OASIS-3
Unmoderated 436 373 2168
No Missing Values 216 354 1970
Reduced Attributes 216 354 1970
9
OASIS-1/2/3-RA datasets had a high importance of 70 to 100%.
10
algorithms and dataset utilized. It was also the reason the algorithms, datasets,
and methods utilized were those that were well-established and frequently used,
as they had previously been able to produce favourable results. Because of this,
the claims from this thesis regarding the validity and robustness of Decision
Tree models in detecting AD, derive from inductive reasoning.
5 Results
The prediction accuracy achieved by the Decision Tree models in the experi-
ment, ranged between 76.94% to 90.37% (Table 6). Two models had the highest
prediction accuracy. The models were constructed from the OASIS-1-U dataset
applied on the CART and CHAID algorithm, respectively. The second highest
was also obtained from the OASIS-1-U dataset, but applied on the C4.5 algo-
rithm. The model that had the lowest prediction accuracy was constructed from
the OASIS-2-U dataset applied on the C4.5 algorithm.
The results of the prediction accuracy, within each dataset variant, show that
the models constructed from the OASIS-2 datasets produced the lowest predic-
tion accuracy compared to the models from OASIS-1 and OASIS-3. In contrast,
the OASIS-1-U dataset produced models with the highest prediction accuracy
compared to the other Unmoderated datasets. Of the other two dataset variants,
the No Missing Values variant and the Attribute Reduction variant, OASIS-2
produced models with the highest prediction accuracy compared to OASIS-1
and OASIS-3.
11
Table 6: The prediction accuracy of the Decision Tree models obtained
in the experiment, measured in percentage (%)
The Decision Tree models, obtained in the experiment, had number of nodes
that ranged between 3 to 55 (Table 7). The model that had the most amount
of nodes was constructed from the OASIS-2-U dataset on the C4.5 algorithm.
In contrast, there were 11 models that had the least number of nodes.
OASIS-1-NMV 15 7 11
OASIS-2-NMV 43 3 13
OASIS-3-NMV 16 3 19
OASIS-1-RA 7 3 3
OASIS-2-RA 13 3 3
OASIS-3-RA 3 3 3
The depth of the Decision Tree models obtained in the experiment, ranged
between 1 to 10 (Table 8). Two models had the deepest depth. One was con-
structed from the OASIS-2-U dataset and the other from the OASIS-2-NMV
dataset, both of which were used on the C4.5 algorithm. In contrast, the mod-
els that had the shortest depth were the same models that had the least amount
of nodes.
12
Table 8: The depth of the Decision Trees models
obtained in the experiment
OASIS-1-NMV 7 3 5
OASIS-2-NMV 10 1 5
OASIS-3-NMV 6 1 5
OASIS-1-RA 3 1 1
OASIS-2-RA 5 1 1
OASIS-3-RA 1 1 1
Between the algorithms and within the datasets, the results also show that
the C4.5 algorithm predominantly constructed models that had the most num-
ber of nodes and the deepest depth. The exceptions were the OASIS-3-NMV
and OASIS-3-RA datasets. From the OASIS-3-NMV dataset, the model con-
structed using the C4.5 algorithm had the deepest depth. However, the model
constructed using the CHAID algorithm had more nodes. From the OASIS-
3-RA dataset, the models constructed across all three algorithms, had equal
amounts of number of nodes and depth. Moreover, between the datasets and
within the dataset variants, the OASIS-2 dataset on the C4.5 algorithm con-
structed the most number of nodes and the deepest depth.
On the other hand, between the algorithms and within the datasets, the results
show that the CART algorithm predominantly constructed models that had the
least number of nodes and shortest depth. The exceptions were the models con-
structed from the Reduced Attributes datasets. The models constructed from
the CHAID algorithm from all three datasets of the Reduced Attributes vari-
ant and the model constructed using the C4.5 algorithm using the OASIS-3-RA
dataset, had equal amounts of nodes and depth as the CART models. Conse-
quently, the Reduced Attributes datasets produced the most models with the
least number of nodes and shortest depth, compared to the datasets of the other
variants.
6 Analysis
6.1 Weak Points
The Decision Tree model obtained from the OASIS-2-U dataset on the C4.5
algorithm had the deepest depth, most amount of nodes, and the lowest predic-
tion accuracy, performing the worst compared to the other models obtained in
the experiment. One reason that the prediction accuracy is lower than others
might be because the model is bigger and more complex. This suggests that the
model overfits the training dataset.
13
The C4.5 algorithm, generally constructed larger Decision Tree models com-
pared to the CART and CHAID algorithms, especially from the OASIS-2 dataset.
And the models constructed from the OASIS-2 dataset, despite the algorithm
used, produced lower prediction accuracy than the models constructed from the
OASIS-1 and OASIS-3 datasets.
One characteristic that distinguishes the C4.5 algorithm from the CART and
CHAID Decision Tree algorithm is its splitting criterion; Gain Ratio. This may
be the cause the C4.5 algorithm built larger Decision Tree models compared to
the other algorithms. Moreover, the Gain Ratio splitting criterion seems espe-
cially unsuitable when used with the OASIS-2 dataset. This might be because
the OASIS-2 dataset had the least number of participants involved and the age
range of all the participants were smaller compared to the other datasets. The
small data sample size of the OASIS-2 dataset may also have contributed to
the poor performance. However, for the No Missing Values and Reduced At-
tributes dataset variants, the OASIS-1 datasets had less data samples. Despite
this, the OASIS-1-NMV and the OASIS-1-RA datasets still produced smaller
models with higher prediction accuracy than the OASIS-2 counterparts. It may,
therefore, be more suitable that the datasets used on C4.5, CART, and CHAID
to detect AD, include more participants, of which covers a wider age range than
the OASIS-2 datasets.
The OASIS-1-U dataset produced models with the best prediction accuracy
compared to the models obtained from the other datasets. From this dataset,
the CART and CHAID algorithm obtained models that had the highest pre-
diction accuracy compared to all the other models obtained. Also from this
dataset, the C4.5 algorithm obtained the model with the second highest predic-
14
tion accuracy. What makes the OASIS-1 dataset distinguishable from OASIS-2
and OASIS-3 was that the age range of the participants was the largest and the
data in the dataset were acquired from a cross-sectional study. Having these
characteristics might have contributed to the reason the OASIS-1-U dataset pro-
duced models with better prediction accuracy than the models produced from
the other Unmoderated datasets. However, looking at it from another perspec-
tive, the amount of data samples of the OASIS-2-U dataset might have been too
little and the amount data samples of the OASIS-3-U dataset might have been
too much, to be able to produce favourable results. This could be the reason
why the Unmoderated variant of the OASIS-2 and OASIS-3 datasets produced
models with lower prediction accuracy than the models constructed from their
dataset variant counterparts, with only one exception. This exception was the
OASIS-2-RA dataset constructed using the CART algorithm.
One reason the OASIS-1-U dataset produced models with better prediction
accuracy compared to the other OASIS-1 datasets could be behind the way the
algorithms handle missing values using the OASIS-1 dataset. Another reason,
or a factor that could have played a big role in generating these results, relates
to the size of the data sample of the OASIS-1 dataset. As implicitly mentioned
in the previous paragraph, it is possible the size, when the OASIS-1 dataset was
Unmoderated, was favourable for the data it comprised of. After the removal of
missing values from the OASIS-1 dataset, the size of the data sample decreased
by more than half. Because of this, the OASIS-2-NMV and OASIS-2-RA had
more data samples than the OASIS-1-NVM and OASIS-1-RA. Less data after
the removal of missing values could have had a detrimental impact on the No
Missing Values and the Reduced Attribute variants of the OASIS-1 dataset. It
would, therefore, be interesting to see the size and the prediction accuracy of the
Decision Tree models constructed from a dataset acquired from a cross-section
study and has data sample around the same size as the OASIS-1-U dataset,
even after the removal of missing values.
However, it is also important to note regarding the OASIS-1 dataset that the
involved participants that were below the age of 62 years, when the data was
acquired, did not show signs of AD (Table 1). This could mean that the dataset
may produce models that perform poorly if it is tasked to predict unseen data
that contains participants under 62 years old that have AD.
15
Furthermore, the CART algorithm, which the most favourable Decision Tree
model was constructed by, produced the most amount of models with the least
amount of nodes and shortest depth. Though the models created from the
CART algorithm are generally smaller, the results do not show that this was
at the expense of accuracy. The same can be said for the Reduced Attributes
datasets. Though the most favourable model was constructed from an Un-
moderated dataset variant, the Reduced Attributes dataset variant created the
smallest sized Decision Trees compared to the other dataset variants. And in
this case also, comparing with the datasets of the other variants, the results do
not indicate that the small number of nodes and the short depth of these models
were at the expense of accuracy.
The highest prediction accuracy achieved by Bansal et al. from the OASIS-
1 dataset was 99.52%, both before and after the application of the attribute
reduction method. The results from the experiment of this thesis achieved
88.30% on the OASIS-1 dataset before using the attribute reduction method
and 85.65% after the attribute reduction method. Both from the OASIS-1 and
OASIS-2 dataset on the C4.5 algorithm, Bansal et al. achieved a higher predic-
tion accuracy than the results of the experiment conducted for this thesis. The
reason behind the dissimilarity, between the results obtained by Bansal et al.
and the experiment of this thesis, however, is unclear.
In the study conducted by Farhan et al. (2014), the ensemble model was claimed
the best as it achieved the highest prediction accuracy with 93.75%. The De-
cision Tree model, on the other hand, performed the worst. It achieved the
lowest prediction accuracy on all cases but one. The exception was that the De-
cision Tree obtained the same lowest prediction accuracy as other models. The
methodology used in Farhan et al.’s study is different than the one proposed for
the experiment of this thesis. However, the highest prediction accuracy achieved
in the experiment of this thesis was 90.37%, which indicates that Decision Trees
have potential in the detection of AD.
16
7 Discussion
7.1 Limitations
One limitation worth mentioning involves the attributes of the datasets. As
mentioned under Section 4.2, the attributes in each dataset were able to be
expanded upon. However, the decision, as to which attributes to choose and
how many to choose, was arbitrary rather than educated. The only require-
ment was that the datasets comprised of demographic, clinical, and derived
brain anatomical information and that the amount of attributes utilized was
not excessive. Another limitation regarding the attributes utilized is that, dis-
regarding the datasets of the Reduced Attributes variant, the attributes in the
OASIS-3 datasets differed from OASIS-1 and OASIS-2 datasets. Studying how
different attributes affect the results is more appropriate if only one of the OA-
SIS datasets were utilized for this. Having more factors that differ between the
datasets can complicate the process of analyzing and comparing the datasets.
Lastly, another limitation involves the pruning method. The pruning method
differed between the Decision Tree algorithms. The analysis of the results do
not reveal differences that could have been caused from having different pruning
methods. However, it would have been more ideal had the same pruning method
been used across the three Decision Tree algorithms. This way, any difference
between the algorithms seen in the results can more directly be traced to the
algorithm.
7.2 Conclusion
This thesis was interested in studying the performance of Decision Tree algo-
rithms used to detect AD. Therefore, to gain insight on the validity of the
models constructed from these algorithms, different datasets that concern AD
were applied to different Decision Tree algorithms. As discussed in Section 4.4,
the proposed methodology is not meant to help arrive to a resolved conclusion.
Instead, it is meant to provide some knowledge, which is used to infer a theory.
The results from the experiment showed that the C4.5 algorithm and OASIS-2
dataset performed worse than their respective counterparts. In contrast, the
CART algorithm, and the OASIS-1-U, OASIS-3-NMV, OASIS-3-RA datasets
performed better than their respective counterparts. These results reveal which
datasets and algorithms are more appropriate than others in the context of the
experiment.
The results also revealed the worst and the best Decision Tree models. The
worst had a prediction accuracy of 76.94%, 55 number of nodes, and a depth
10. In contrast, the best had a prediction accuracy of 90.37%, 5 number of
nodes, and a depth 2. Though the method of the experiment was not designed
to address the full scope of the problem statement, the results obtained were able
to satisfy the curiosity that Decision Trees are valid and suitable for detecting
AD.
17
7.3 Future Work
The findings of this thesis indicate that Decision Trees are valid models to be
used for detecting AD. A future work could therefore involve detecting different
stages of AD rather than only detecting if AD is present or not.
Another idea for a future work is to examine the possibility of using images
rather than textual information to detect AD. The OASIS database provided
brain scan images, such as MRI and PET scans. This work could then in-
vestigate whether or not Decision Trees remain suitable to detect AD if brain
scanning images were utilized.
8 References
Alzheimer’s Association, 2021. What is Alzheimer’s Disease?.
https://www.alz.org/alzheimers-dementia/what-is-alzheimers (Accessed June 15,
2021).
Bansal, D., Chhikara, R., Khanna, K. and Gupta, P., 2018. Comparative anal-
ysis of various machine learning algorithms for detecting dementia. Procedia
computer science, 132, pp.1497-1502. Elsevier.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984. Clas-
sification and Regression Trees. CRC press.
Buckner, R.L., Head, D., Parker, J., Fotenos, A.F., Marcus, D., Morris, J.C.
and Snyder, A.Z., 2004. A unified approach for morphometric and functional
data analysis in young, old, and demented adults using automated atlas-based
head size normalization: reliability and validation against manual measurement
of total intracranial volume. Neuroimage, 23(2), pp.724-738.
Eibe Frank, Mark A. Hall, and Ian H. Witten, 2016. The WEKA Workbench.
Online Appendix for "Data Mining: Practical Machine Learning Tools and Tech-
niques", Morgan Kaufmann, 2016(4).
Folstein, M.F., Folstein, S.E. and McHugh, P.R., 1975. “Mini-mental state”:
a practical method for grading the cognitive state of patients for the clinician.
Journal of psychiatric research, 12(3), pp. 189-198.
Fotenos, A.F., Snyder, A.Z., Girton, L.E., Morris, J.C. and Buckner, R.L.,
2005. Normative estimates of cross-sectional and longitudinal brain volume de-
cline in aging and AD. Neurology, 64(6), pp.1032-1039.
Hollingshead, A.B., 1957. Two factor index of social position. Yale Univer-
18
sity Press, New Haven.
Ibarguren, I., Lasarguren, A., Pérez, J.M., Muguerza, J., Gurrutxaga, I. and
Arbelaitz, O., 2016. Bfpart: Best-first part. Information Sciences, 367, pp.
927-952. Elsevier.
LaMontagne, P. J., Benzinger, T. L. S., Morris, J. C., Keefe, S., Hornbeck, R.,
Xiong, C., Grant, E., Hassenstab, J., Moulder, K., Vlassenko, A. G., Raichle,
M. E., Cruchaga, C., and Marcus, D., 2019. OASIS-3: Longitudinal Neuroimag-
ing, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer Disease.
MedRxiv. Cold Spring Harbor Laboratory Press. doi: 10.1101/2019.12.13.19014902.
Marcus, D. S., Wang, T. H., Parker, J., Csernansky, J. G., Morris, J. C.,
Buckner, R. L., 2007. Open Access Series of Imaging Studies (OASIS): Cross-
sectional MRI Data in Young, Middle Aged, Nondemented, and Demented Older
Adults. Journal of cognitive neuroscience, 19(9). pp. 1498-1507. MIT Press.
doi: 10.1162/jocn.2007.19.9.1498.
Miah, Y., Prima, C.N.E., Seema, S.J., Mahmud, M. and Kaiser, M.S., 2021.
Performance comparison of machine learning techniques in identifying dementia
from open access clinical datasets. In Advances on Smart and Soft Computing,
pp. 79-89. Singapore: Springer.
Morris, J. C., 1993. The Clinical Dementia Rating (CDR): Current version
and scoring rules. Neurology, 43, pp. 2412–2414.
Quinlan, J. R., 1993. C4.5. Programs for Machine Learning. Morgan Kauf-
mann Publishers.
19