Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views23 pages

Full Text 01

This thesis explores the application of Decision Trees to detect Alzheimer's Disease by analyzing 27 models derived from nine datasets using three algorithms: C4.5, CART, and CHAID. The results indicate that while the C4.5 algorithm performed poorly, the CART algorithm with certain datasets achieved a prediction accuracy of up to 90.37%. The findings suggest that Decision Trees are a viable option for improving the diagnostic process for Alzheimer's Disease.

Uploaded by

Houichette Amira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

Full Text 01

This thesis explores the application of Decision Trees to detect Alzheimer's Disease by analyzing 27 models derived from nine datasets using three algorithms: C4.5, CART, and CHAID. The results indicate that while the C4.5 algorithm performed poorly, the CART algorithm with certain datasets achieved a prediction accuracy of up to 90.37%. The findings suggest that Decision Trees are a viable option for improving the diagnostic process for Alzheimer's Disease.

Uploaded by

Houichette Amira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

THE USE OF DECISION

TREES TO DETECT
ALZHEIMER’S DISEASE
Janis Pettersson

Bachelor’s Thesis, 15 credits


Bachelor’s Programme in Computing Science

2021
Abstract
Alzheimer’s Disease is the most common disease of dementia, which may in-
volve the decline of memory, communication, and judgement, yet it is hard to
diagnose. The machine learning algorithms, Decision Trees, provide a possible
solution to reduce the difficulty of the diagnosis process of Alzheimer’s Dis-
ease. This thesis studied the use of Decision Trees to detect Alzheimer’s Disease
by investigating 27 different Decision Tree models derived from applying nine
datasets, concerning Alzheimer’s Disease, on three Decision Tree algorithms.
The datasets utilized were the OASIS: Cross-Sectional, OASIS: Longitudinal,
and OASIS-3 datasets, each of which had three variants; Unmoderated that con-
tained missing values, No Missing Values that contained no missing values, and
Reduced Attributes that contained no missing values and no attributes of low im-
portance. The algorithms utilized were the C4.5, CART, and CHAID Decision
Tree algorithms. The results showed that the C4.5 algorithm and the OASIS-
2 datasets performed worse than their alternatives. Moreover, it showed that
the CART algorithm, the Unmoderated OASIS-1 dataset, and the No Missing
Values and Reduced Attributes OASIS-3 datasets performed better than their
alternatives. The results also reveal that the worst Decision Tree model ob-
tained in the experiment had a prediction accuracy of 76.94%, 55 number of
nodes, and a depth 10. In contrast, the best model had a prediction accuracy
of 90.37%, 5 number of nodes, and a depth 2. The results suggest that Decision
Trees are suitable models to detect Alzheimer’s Disease.

i
Acknowledgments
The author gratefully acknowledges the data provided by the following:
• OASIS: Cross-Sectional: Principal Investigators: D. Marcus, R. Buckner,
J. Csernansky, J. Morris; P50 AG05681, P01 AG03991, P01 AG026276,
R01 AG021910, P20 MH071616, U24 RR021382.

• OASIS: Longitudinal: Principal Investigators: D. Marcus, R. Buckner, J.


Csernansky, J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01
AG021910, P20 MH071616, U24 RR021382.
• OASIS-3: Principal Investigators: T. Benzinger, D. Marcus, J. Morris;
NIH P50 AG00561, P30 NS09857781, P01 AG026276, P01 AG003991, R01
AG043434, UL1 TR000448, R01 EB009352. AV-45 doses were provided
by Avid Radiopharmaceuticals, a wholly owned subsidiary of Eli Lilly.

ii
Contents
Abstract i

Acknowledgments ii

Contents iii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Related Works 2

3 Theoretical Background 4

4 Method 6
4.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Justification of Methodology . . . . . . . . . . . . . . . . . . . . . 10

5 Results 11

6 Analysis 13
6.1 Weak Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Strong Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Existing Literature . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Discussion 17
7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

8 References 18

iii
1 Introduction
1.1 Background
Dementia describes a symptom of disease that deteriorates several higher cor-
tical functions (memory, communication, judgement, etc.) (World Health Or-
ganization, 2016). Alzheimer’s Disease (AD) is the most common disease of
dementia. According to Alzheimer’s Association, 60 to 80% of all dementia
cases is AD. It may be difficult to detect AD, especially in the early stages or if
the symptoms are mild. In a few cases, the use of expensive equipment or the
conduction of numerous test might be required to diagnose a patient. Inspired
by the current progress of machine learning, this thesis studies the use of the
machine learning algorithms, Decision Trees, to detect AD.

In the field of machine learning, there is a consensus that deep learning mod-
els produce higher prediction accuracy compared to the other machine learning
models. Despite having lower prediction accuracy, there are still applications
in which non-deep learning models and algorithms are preferable. The qualities
that make Decision Trees desirable are simplicity and interpretability, meaning
that they are easily understood by humans. This model is especially benefi-
cial for systems in the medical domain because, in most cases, if an error is
detected, it is important to understand the cause of it in order to correct it.
Because the models are understandable to humans, systems that utilize Deci-
sion Trees allow the process of removing errors to be done without the struggle
of pinpointing where the issue occurred. This provides motivation for further
research on Decision Trees and its potential in various applications.

1.2 Problem Statement


The size and prediction accuracy of Decision Trees are predominantly influenced
by the datasets and algorithms used to construct them. Some datasets, however,
can pose a challenge to the algorithm. In datasets containing data involving hu-
man participants, for example, outlier data samples may be more common, than
in other datasets, due to the individuality of humans. Also, because humans are
complex beings, there are many factors that can contribute to a problem when
human subjects are involved. However, if an exhaustive dataset was possible to
create, using it can be expensive, inefficient, or counterproductive as the model
may overfit the data. Taking into account only a few significant factors may be
more profitable. Then, a question one might ask is which factors are beneficial
to consider. It is, therefore, of interest to study how datasets can affect the size
of a Decision Tree and the prediction accuracy, and to learn which or what kind
of dataset should be used to satisfy a specific intention.

The goal of this thesis is to compare the prediction accuracy and size of the
different Decision Trees attained from nine datasets pertaining to AD, each
of which were applied on three Decision Tree algorithms. The purpose is to
gain insight on how different datasets, concerning AD, and algorithms influ-
ence prediction accuracy and the size of the Decision Trees they construct. The
knowledge could be used to help choose the appropriate dataset and/or Deci-
sion Tree algorithm, maybe even beyond this context. Hopefully, this study can

1
inspire more research in the use of machine learning algorithms, in the medical
domain, to detect hard-to-diagnose diseases.

1.3 Approach
Through this thesis, an experiment was conducted by applying different datasets,
pertaining to AD, on different Decision Tree algorithms to construct different
Decision Tree models. The number of nodes, the depth, and the prediction ac-
curacy of these models were compared. The different datasets were derived from
the Open Access Series of Imaging Studies (OASIS) database1 ; OASIS: Cross-
Sectional (Marcus et al., 2007), OASIS: Longitudinal (Marcus et al., 2010), and
OASIS-3 (LaMontagne et al., 2019). In this paper, OASIS: Cross-Sectional is
referred to as OASIS-1 and OASIS-Longitudinal is referred to as OASIS-2. The
datasets were in tabular form and contained data samples, with nominal and
numerical values. Three variants were created from each of these datasets, pro-
ducing a total of nine datasets, all of which were utilized in the experiment. The
Decision Tree algorithms used were the C4.5 (Quinlan, 1993), CART (Breiman
et al., 1984), and CHAID (Kass, 1980) algorithm. The 10-fold cross validation
method was invoked during the training phase to minimize the risk of the model
overfitting the training dataset. The Decision Tree that does not overfit the data
and has a high prediction accuracy, low amount of nodes, and short depth, is
the most favourable model. The experiment was executed using the machine
learning software, Weka 3.8.52 (Eibe et al., 2016).

1.4 Thesis Outline


In Section 2, studies similar to this thesis are described to acknowledge the
current development in this subject. In Section 3, the theoretical background
of Decision Trees are explained. Section 4 has a more detailed description of
the proposed methodology of this study. Section 5 is where the results of the
experiment performed are demonstrated, whereas Section 6 is where the results
are analyzed. In Section 7, limitations are addressed, a conclusion is stated, and
future works are suggested.

2 Related Works
A meta study conducted by Miah et al. (2021) compared the performance of
different machine learning algorithms for detecting diseases of dementia through
the results from existing literature and concluded that diseases of dementia can
be detected in its early stages. The algorithms included Support Vector Ma-
chine, Logistic Regression, Artificial Neural Network, Naive Bayes, Decision
Tree, Random Forest, and k-Nearest Neighbours. The results showed that Sup-
port Vector Machine and Random Forest performed best. Among the literature
studied, two of them used Decision Trees in their comparative research; Bansal
et al. (2018) and Farhan et al. (2014). These two studies are described more
below.

1 https://www.oasis-brains.org/
2 https://www.cs.waikato.ac.nz/ml/weka/

2
In a study conducted by Bansal et al. (2018), an experimental study was per-
formed to compare the prediction accuracy of two datasets applied on different
machine learning algorithms before and after the application of an attribute
reduction method. The algorithms were Decision Tree, Naive Bayes, Random
Forest, and Multilayer Perceptron. The attribute reduction method used was
CFS Subset Evaluator. The two datasets used are from the OASIS database,
OASIS-1 and OASIS-2. The results showed that the C4.5 Decision Tree al-
gorithm implementation called J48 produced the most favourable results by
generating a generally high prediction accuracy. The lowest prediction accuracy
was on the OASIS-2 dataset after the application of the attribute reduction
method (98.66%), and the highest was on the OASIS-1 dataset before and after
the attribute reduction method (99.52%). The algorithm and reduction method
utilized was provided by the Weka machine learning software.

A study conducted by Farhan et al. (2014) proposed a new approach to de-


tect AD even in its early stages. To evaluate the approach, data from the
OASIS database was applied on different machine learning algorithms using
different attributes, and the results from the experiment was compared. The
algorithms were Support Vector Machine, Multilayer Perceptron, Decision Tree,
and an ensemble of classifiers determined by majority voting. The attributes
were the volume of gray matter, white matter, and cerebrospinal fluid, size of the
hippocampus, and a combination of these attributes. To validate, the 10-fold
cross-validation method was used. In this study, the Decision Tree implemen-
tation used was Weka’s J48 implementation of the C4.5 algorithm. The results
showed that the highest prediction accuracy produced from using the proposed
approach was with the ensemble of classifiers using the size of left hippocam-
pus (93.75%). The results also showed that using the proposed approach with
Decision Trees produced the lowest prediction accuracy on all but one type of
attribute; volume of white matter, in which the accuracy was equivalent to what
Multilayer Perceptron, Support Vector Machine, and the ensemble of classifiers
also achieved.

Another work related to this thesis is a study conducted by S. Naganandhini


and P. Shanmugavadivu (2019). The authors proposed a method that produced
Decision Trees with optimum hyper parameter tuning. The method was evalu-
ated through the use of data from the OASIS database of participants that have
AD. The method was optimized using Entropy and Information Gain. The re-
sults showed the method achieved an average prediction accuracy of 99.10%. It
was also claimed that the method was able to detect even early stages of AD.

The idea of using machine learning models to detect AD using is not new nor
foreign. To my knowledge, however, the use of Decision Trees to detect AD has
not been thoroughly studied. Decision trees were used for AD detection either
(1.) to compare one Decision Tree algorithm, the C4.5 algorithm, with other
machine learning algorithms or (2.) to propose a new method that involved
Decision Trees and compare that proposed method with one of the traditional
Decision Tree algorithms. Therefore, to my knowledge, nobody has investi-
gated how datasets and Decision Tree algorithms influence the structure of the
Decision Trees and their ability to classify data pertaining to AD.

3
3 Theoretical Background
Decision Trees are interpretable machine learning models commonly used for
predicting data. These models are data structures built upon a parent-child
hierarchy through nodes and edges, i.e. a tree. A parent node represents a
feature or condition. Branches that link together a parent node to different
child nodes, represents choices. A child node represents a decision, and it can
also be a parent node if it is non-terminal. If a node does not have children
nodes, it is considered terminal. Nodes of a Decision Trees are able to have two
or more branches. Because of these characteristics, the structure of a Decision
Tree resembles an upside-down tree and is able to be visualized. The two types
of Decision Trees are classification and regression. Classification Decision Trees
refer to models in which the possible outcomes is a discrete set. Regression
refers to possible outcomes that is a continuous set. Figure 1 illustrates an ex-
ample of a classification Decision Tree used to decide whether or not to bring
an umbrella.

Figure 1: A binary Decision Tree used to decide whether or not to bring your
umbrella.

Similar to most machine learning algorithms, Decision Tree algorithms have a


training and testing phase. The testing phase refers to the process of having
a Decision Tree model make predictions on a subset of a dataset; which is re-
ferred to as the testing dataset. The training phase refers to the construction
of a Decision Tree model from a subset of a dataset; which is referred to as the
training dataset. And to construct a model, Decision Tree algorithms adopt the
supervised learning technique. Moreover, in the training phase, a process called
splitting occurs. This involves creating new branches to child nodes from a par-
ent node. The split is determined by different criteria: Entropy, Information

4
Gain, Gini Diversity Index, Chi-Square, etc. The purpose of these criteria is
to determine the condition that best splits a dataset at a node. The condition
refers to the dataset attributes.

Decision Tree models are able to be evaluated through their prediction accuracy,
number of nodes, and depth. The prediction accuracy is the percentage correct
predictions the model has made against a test dataset. The number of nodes is
the total amount of terminal and non-terminal nodes, i.e. the total amount of
conditions and decisions. The depth is the longest path from the root node to
the leaf node. The root node is the starting node which presents the first condi-
tion. The leaf nodes are the nodes that represents a possible outcome. From the
example in Figure 1, the root node is the node that presents the question "Is it
convenient to bring your umbrella? " and the leaf nodes present the outcomes:
"Take your umbrella" and "Leave you umbrella".

The C4.5, CART, and CHAID algorithms are among the traditional and well-
established Decision Tree algorithms that currently exist today. These algo-
rithms adopt a top-down, greedy approach to construct a Decision Tree. Because
their method is of greedy nature, they are unable to create globally optimal De-
cision Trees. However, they remain prominent despite that obtaining an optimal
Decision Tree is considered an NP-complete problem (Laurent and Rivest, 1976).

The C4.5 algorithm produces n-ary splits. The splitting criterion of the algo-
rithm is based on Gain Ratio (Quinlan, 1993), which uses Entropy to measure
the purity of an attribute. When splitting at a node, the attribute that pro-
duced the highest Gain Ratio is chosen.

CART is an acronym for Classification and Regression Trees. The CART algo-
rithm produces binary splits and its splitting criterion is based on Gini Diversity
Index (Breiman et al., 1984), which is used to measure the impurity of an at-
tribute. When splitting at a node, the attribute that produced the lowest Gini
Index value is chosen.

CHAID is an acronym for Chi-Square Automatic Interaction Detector. The


CHAID algorithm produces n-ary splits. The splitting criterion is based on p-
values, from statistical significance testing, that utilize Bonferroni Correction.
For building classification CHAID Decision Trees, the statistical significance
testing used is Chi-Square (Kass, 1980). When performing a split at a node,
the attribute that produced the lowest p-value is chosen. This is because the
lower the p-value, the higher the statistical significance.

It is possible Decision Tree models overfit the training dataset. This means
that the model performs accurately on the training data but not on new data.
In other words, the model does not generalize well and only makes accurate
predictions on the training dataset. There are methods that can be deployed
to avoid this. Attribute reduction, pruning, and k-fold cross-validation are a
few of these methods. Attribute reduction involves the removal of attributes
and pruning involves the removal of nodes and branches, i.e. subtrees. The
respective methods remove elements deemed to bring little to no contribution
on creating an accurate model that generalizes well. The k-fold cross-validation

5
method involves training the model on different datasets and evaluating the
model on different test datasets. This is accomplished by dividing a dataset
into k amount of subsets. Thereafter, an iteration process occurs, wherein each
of the subsets are able to act as a test dataset while the remaining data acts as
the training dataset.

4 Method
4.1 Algorithm Description
The Decision Tree algorithms used in the experiment of this thesis were C4.5,
CART, and CHAID. These algorithms were chosen because they are popular
and well-established. Among these algorithms, C4.5 is particularly popular in
recent studies that propose new approaches of obtaining a Decision Tree. It
frequently acts as a benchmark for studying the performance of the Decision
Tree models it constructs. The implementation of these algorithms were pro-
vided by the open-sourced machine learning software Weka, version 3.8.5. The
C4.5, CART, and CHAID implementation used on this software was J48, Sim-
pleCART, JCHAIDStar, respectively. To be specific, JCHAIDStar implemented
a modified version of CHAID, which was able to handle continuous attributes
(Ibarguren et al., 2016). All three implementations were able to handle missing
values.

The parameters of the implementations were set to Weka 3.8.5’s default values.
The J48 and JCHAIDStar implementations used the post-pruning method Sub-
tree Raising to prune the Decision Trees they constructed. For both implemen-
tations, the confidence factor used for pruning was set to 0.25. The SimpleCART
implementation used the Minimum Cost-Complexity post-pruning algorithm to
prune its constructed CART Decision Trees. This pruning algorithm utilized
the 5-fold internal cross-validation method to determine the cost-complexity pa-
rameter.

The 10-fold cross validation method is commonly used to build more robust
and accurate machine learning models by minimizing the risk of the model
overfitting the training dataset. Therefore, in the experiment of this thesis, this
method was invoked during the training phase.

4.2 Dataset Description


The datasets used in the experiment were from the OASIS database: OASIS-1,
OASIS-2, and OASIS-3. A summarized description of these datasets are shown
in Table 1. Age range, Age Range - with AD and Age Range - no AD, denoted
in the table, includes the age of the participants from the first and last occasion
data was collected from them. Age range refers to the total age range of the
participants, Age Range - with AD refers to the age range of the participants
diagnosed with AD, and Age Range - no AD refers to the age range of partici-
pants that show no symptoms of AD. Table 1 also shows the variety of the total
number of participants between the datasets and the type of study the data
from the datasets were acquired from.

6
OASIS-1, OASIS-2, and OASIS-3 were tabular datasets that stored both nom-
inal and numerical data. The datasets contained references to brain scanning
images and the demographic, clinical, and derived brain anatomical information
of the participants. The list of attributes utilized by the respective datasets are
demonstrated in Table 2 and the descriptions of these attributes, used to de-
scribe a participant, are presented in Table 3. The use of brain scanning images
was excluded from this study to focus on textual classification rather than image
classification.

Table 1: Description of the OASIS datasets

Datasets
Characteristics OASIS-1 OASIS-2 OASIS-3
Participants 416 150 1098
Type of Study Cross-Sectional Longitudinal Longitudinal
Age Range 18 to 96 60 to 98 42 to 97
Age Range - with AD 62 to 96 61 to 98 49 to 97
Age Range - no AD 18 to 94 60 to 97 42 to 97

Table 2: The attributes in each dataset used in the experiment, before


the application of the attribute reduction method

OASIS-1 OASIS-2 OASIS-3


M/F M/F M/F
Hand Hand Hand
Age Age Age
EDUC EDUC EDUC
MMSE MMSE MMSE
CDR CDR CDR
SES SES IntraCranialVol
eTIV eTIV CortexVol
nWBV nWBV TotalGrayVol
ASF ASF CSFVol

The OASIS database contained additional attributes that are able to extend
the datasets with supplementary information regarding a participant. However,
a trade-off was made between quantity and quality. Therefore, the attributes
chosen for each dataset were limited to an amount that is not overwhelming but
still comprised of demographic, clinical, and derived brain anatomical informa-
tion.

7
Table 3: Descriptions of the attributes from the OASIS datasets, used in the
experiment. Each attribute represents a feature of the participant it describes.

Attribute Description
M/F The sex: male or female.

Hand The dominant hand: left or right.

Age The age in years.

EDUC In OASIS 1, the education level, from 1 to 5.


1 represents an education less than high school
graduation and 5 represents education beyond
college. In OASIS 2 and 3, the years of educa-
tion.

SES The socioeconomic status, based on the Holling-


shead Two-Factor Index of Social Position
(Hollingshead, 1957).

MMSE The score from the Mini-Mental State Exami-


nation, which is a measurement of the partici-
pant’s cognitive state (Folstein et al., 1975).

CDR The Clinical Dementia Rating, which is a mea-


surement of the participant’s stage of dementia
(Morris, 1993). 0, 0.5, 1, 2, and 3 represents
no dementia, very mild dementia, mild demen-
tia, moderate dementia, and severe dementia,
respectively.

eTIV The estimated total intracranial volume (Buck-


ner et al., 2004).

nWBV The normalized whole brain volume (Fotenos et


al., 2005).

ASF The atlas scaling factor, used to obtain the eTIV


(Buckner et al., 2004).

IntraCranialVol The intracranial volume.

CortexVol The cortex volume.

TotalGrayVol The total gray matter volume.

CSFVol The cerebrospinal fluid volume.

8
The Clinical Dementia Rating (CDR) was used to determine the stage of AD of
a participant. Therefore, for each dataset, the CDR attribute was used to clas-
sify the data samples into one of two groups. These groups were "AD" and "No
AD", which depict AD as being present or absent, respectively. Data samples
with a CDR value of 0 are members of the "No AD" category. Data samples
with a CDR value of more than 0 are members of the "AD" category.

Three variants of each dataset were used in the experiment. The first vari-
ant was the Unmoderated dataset. These are denoted as OASIS-1/2/3-U.
These datasets preserved the NULL or missing values, which the OASIS-1,
OASIS-2, and OASIS-3 datasets had. The second variant was the No Miss-
ing Values dataset, in which data samples that contained missing values were
removed. These datasets are denoted as OASIS-1/2/3-NMV. Table 4 illustrates
the amount of data samples in each OASIS dataset, before and after the removal
of missing values. The third variant is the Reduced Attributes dataset, in which
data samples that contained missing values and attributes that had low impor-
tance, were removed. These datasets are denoted as OASIS-1/2/3-RA. Table
5 illustrates the attributes utilized in each OASIS dataset after the removal of
attributes that had low importance.

Missing values in a data sample denote a loss of information for one or more
attributes. To avoid Decision Tree algorithms from interpreting the data incor-
rectly and corrupt the model, missing values should be handled. The imple-
mentation of the algorithms used in the experiment of this thesis are able to
handle missing values. Another way to handle missing values is to remove data
samples that contain them. To study how the different ways of handling miss-
ing values affect the models obtained from the OASIS datasets, both cases of
handling missing values were investigated through the formation of the Unmod-
erated and No Missing Values dataset variants. The formation of the Reduced
Attribute dataset variant, on the other hand, was motivated by the knowledge
that the removal of attributes with low importance minimizes the risk of the
model overfitting the training dataset.

Table 4: Amount of data samples in each dataset

Datasets
Variants OASIS-1 OASIS-2 OASIS-3
Unmoderated 436 373 2168
No Missing Values 216 354 1970
Reduced Attributes 216 354 1970

The Correlation-based Feature Selection (CFS) Subset Evaluator, provided by


the Weka 3.8.5 software, was the attribute reduction method used to decide
which attributes to remove in order to procure the OASIS-1/2/3-RA datasets.
The OASIS-1/2/3-NMV datasets was used on the method to attain the impor-
tance value of each attribute. The importance of an attribute, in this case,
refers to the extent of which an attribute can positively affect the prediction
accuracy of the Decision Tree model constructed from it. The attributes in the

9
OASIS-1/2/3-RA datasets had a high importance of 70 to 100%.

Table 5: The attributes in each dataset used in the experiment, after


the application of the attribute reduction method

OASIS-1 OASIS-2 OASIS-3


M/F M/F EDUC
Age nWBV Age
MMSE MMSE MMSE
CDR CDR CDR

4.3 Data Collection


A total of nine datasets were utilized in the experiment after procuring the
Unmoderated, No Missing Values, and Reduced Attributes variants from the
OASIS-1, OASIS-2, and OASIS-3 datasets. These datasets were applied on the
C4.5, CART, and CHAID Decision Trees algorithms, respectively, to construct
a total of 27 Decision Tree models. The models were evaluated by their number
of nodes, depth, and prediction accuracy. These were the only characteristics
considered from the models because these three traits alone are able to reveal
sufficient information, such as which model performed best, which performed
worst, and whether or not the model shows signs of overfitting the training
dataset. It is through these observations this thesis intends to determine the
validity of Decision Trees used to detect AD. The worst model is the one that
overfits the data. This model could be characterized as having a complex struc-
ture or a large amount of nodes, but has a low prediction accuracy. The most
favourable model is the one that has a small amount of nodes, short depth, and
a high prediction accuracy. The small size of the model indicates that it has not
been overly trained to only produce accurate results on the training dataset.
And a high prediction accuracy despite the model’s small size indicates that the
model is able to generalize well and perform accurately.

4.4 Justification of Methodology


Other well-established Decision Tree algorithms, not utilized in the experiment,
include ID3 and MARS. However, the algorithms were limited only to C4.5,
CART, and CHAID. The Weka 3.8.5 software provides the functionality of ad-
justing and changing the parameters of the algorithms, such as disabling the
post-pruning method and adjusting the folds of the cross-validation method.
However, the default values were chosen and kept constant throughout the ex-
periment. The datasets were limited to the OASIS-1, OASIS-2, and OASIS-3,
each of which produced the variants Unmoderated, No Missing Values, and Re-
duced Attributes. There are, however, other datasets that concern Alzheimer’s
Disease, such as Alzheimer’s Disease Neuroimaging Initiative (ADNI). It is chal-
lenging to test all the possible combintations of these factors. Because the aim
of this study is to gain more insight on the potential Decision Tree models have
in the detection of AD, it is more suitable to generate a comprehensive gener-
alization from the results. This was the purpose for the variety between the

10
algorithms and dataset utilized. It was also the reason the algorithms, datasets,
and methods utilized were those that were well-established and frequently used,
as they had previously been able to produce favourable results. Because of this,
the claims from this thesis regarding the validity and robustness of Decision
Tree models in detecting AD, derive from inductive reasoning.

5 Results
The prediction accuracy achieved by the Decision Tree models in the experi-
ment, ranged between 76.94% to 90.37% (Table 6). Two models had the highest
prediction accuracy. The models were constructed from the OASIS-1-U dataset
applied on the CART and CHAID algorithm, respectively. The second highest
was also obtained from the OASIS-1-U dataset, but applied on the C4.5 algo-
rithm. The model that had the lowest prediction accuracy was constructed from
the OASIS-2-U dataset applied on the C4.5 algorithm.

The results of the prediction accuracy, within each dataset variant, show that
the models constructed from the OASIS-2 datasets produced the lowest predic-
tion accuracy compared to the models from OASIS-1 and OASIS-3. In contrast,
the OASIS-1-U dataset produced models with the highest prediction accuracy
compared to the other Unmoderated datasets. Of the other two dataset variants,
the No Missing Values variant and the Attribute Reduction variant, OASIS-2
produced models with the highest prediction accuracy compared to OASIS-1
and OASIS-3.

Furthermore, the lowest prediction accuracy achieved by a Decision Tree model


that was constructed using the CART algorithm was higher compared to the
models constructed from the other algorithms. And, the lowest prediction accu-
racy achieved by a Decision Tree model that was constructed from the Reduc-
tion Attribute datasets was higher compared to the models constructed from
the datasets of the other variants. Moreover, the highest prediction accuracy
achieved from the Reduction Attribute datasets was higher than the highest
achieved from the No Missing Values datasets.

11
Table 6: The prediction accuracy of the Decision Tree models obtained
in the experiment, measured in percentage (%)

Decision Tree Algorithms


Datasets C4.5 CART CHAID
OASIS-1-U 88.30 90.37 90.37
OASIS-2-U 76.94 81.77 79.09
OASIS-3-U 86.67 86.90 83.21

OASIS-1-NMV 82.41 84.26 83.80


OASIS-2-NMV 79.10 82.49 81.07
OASIS-3-NMV 86.70 87.82 87.36

OASIS-1-RA 85.65 84.26 84.26


OASIS-2-RA 80.23 79.94 81.36
OASIS-3-RA 87.56 87.87 87.56

The Decision Tree models, obtained in the experiment, had number of nodes
that ranged between 3 to 55 (Table 7). The model that had the most amount
of nodes was constructed from the OASIS-2-U dataset on the C4.5 algorithm.
In contrast, there were 11 models that had the least number of nodes.

Table 7: The number of nodes of the Decision Tree


models obtained in the experiment

Decision Tree Algorithms


Datasets C4.5 CART CHAID
OASIS-1-U 35 5 16
OASIS-2-U 55 3 13
OASIS-3-U 43 3 7

OASIS-1-NMV 15 7 11
OASIS-2-NMV 43 3 13
OASIS-3-NMV 16 3 19

OASIS-1-RA 7 3 3
OASIS-2-RA 13 3 3
OASIS-3-RA 3 3 3

The depth of the Decision Tree models obtained in the experiment, ranged
between 1 to 10 (Table 8). Two models had the deepest depth. One was con-
structed from the OASIS-2-U dataset and the other from the OASIS-2-NMV
dataset, both of which were used on the C4.5 algorithm. In contrast, the mod-
els that had the shortest depth were the same models that had the least amount
of nodes.

12
Table 8: The depth of the Decision Trees models
obtained in the experiment

Decision Tree Algorithms


Datasets C4.5 CART CHAID
OASIS-1-U 7 2 4
OASIS-2-U 10 1 5
OASIS-3-U 9 1 2

OASIS-1-NMV 7 3 5
OASIS-2-NMV 10 1 5
OASIS-3-NMV 6 1 5

OASIS-1-RA 3 1 1
OASIS-2-RA 5 1 1
OASIS-3-RA 1 1 1

Between the algorithms and within the datasets, the results also show that
the C4.5 algorithm predominantly constructed models that had the most num-
ber of nodes and the deepest depth. The exceptions were the OASIS-3-NMV
and OASIS-3-RA datasets. From the OASIS-3-NMV dataset, the model con-
structed using the C4.5 algorithm had the deepest depth. However, the model
constructed using the CHAID algorithm had more nodes. From the OASIS-
3-RA dataset, the models constructed across all three algorithms, had equal
amounts of number of nodes and depth. Moreover, between the datasets and
within the dataset variants, the OASIS-2 dataset on the C4.5 algorithm con-
structed the most number of nodes and the deepest depth.

On the other hand, between the algorithms and within the datasets, the results
show that the CART algorithm predominantly constructed models that had the
least number of nodes and shortest depth. The exceptions were the models con-
structed from the Reduced Attributes datasets. The models constructed from
the CHAID algorithm from all three datasets of the Reduced Attributes vari-
ant and the model constructed using the C4.5 algorithm using the OASIS-3-RA
dataset, had equal amounts of nodes and depth as the CART models. Conse-
quently, the Reduced Attributes datasets produced the most models with the
least number of nodes and shortest depth, compared to the datasets of the other
variants.

6 Analysis
6.1 Weak Points
The Decision Tree model obtained from the OASIS-2-U dataset on the C4.5
algorithm had the deepest depth, most amount of nodes, and the lowest predic-
tion accuracy, performing the worst compared to the other models obtained in
the experiment. One reason that the prediction accuracy is lower than others
might be because the model is bigger and more complex. This suggests that the
model overfits the training dataset.

13
The C4.5 algorithm, generally constructed larger Decision Tree models com-
pared to the CART and CHAID algorithms, especially from the OASIS-2 dataset.
And the models constructed from the OASIS-2 dataset, despite the algorithm
used, produced lower prediction accuracy than the models constructed from the
OASIS-1 and OASIS-3 datasets.

One characteristic that distinguishes the C4.5 algorithm from the CART and
CHAID Decision Tree algorithm is its splitting criterion; Gain Ratio. This may
be the cause the C4.5 algorithm built larger Decision Tree models compared to
the other algorithms. Moreover, the Gain Ratio splitting criterion seems espe-
cially unsuitable when used with the OASIS-2 dataset. This might be because
the OASIS-2 dataset had the least number of participants involved and the age
range of all the participants were smaller compared to the other datasets. The
small data sample size of the OASIS-2 dataset may also have contributed to
the poor performance. However, for the No Missing Values and Reduced At-
tributes dataset variants, the OASIS-1 datasets had less data samples. Despite
this, the OASIS-1-NMV and the OASIS-1-RA datasets still produced smaller
models with higher prediction accuracy than the OASIS-2 counterparts. It may,
therefore, be more suitable that the datasets used on C4.5, CART, and CHAID
to detect AD, include more participants, of which covers a wider age range than
the OASIS-2 datasets.

Moreover, as mentioned previously, using an attribute selection method to re-


move attributes that have a low importance decreases the risk for a model to
overfit the training dataset. The models that were constructed from the datasets
of the Reduced Attributes variant, overall, had less number of nodes and shorter
depth compared to the datasets of the other variants. Despite this, the models
obtained from the OASIS-2 dataset of this variant still did not perform bet-
ter than the other datasets. However, using the C4.5 algorithm, the OASIS-2
datasets of the Reduced Attributes variant produced a higher prediction accu-
racy than the Unmoderated and the No Missing Values OASIS-2 datasets. There
was one exception, however, which is the C4.5 model that achieved the second
highest prediction accuracy compared to all the models obtained in the experi-
ment. These results indicate that it is more beneficial that the datasets, applied
on the C4.5 algorithm, only include attributes that have a positive influence on
the size and prediction accuracy of the Decision Tree model it constructs.

6.2 Strong Points


Small models minimize the risk of having the model overfit the data, that is why
a small model with the highest accuracy is the most favourable Decision Tree
model. From the experiment, this model was constructed from the OASIS-1-U
dataset using the CART algorithm.

The OASIS-1-U dataset produced models with the best prediction accuracy
compared to the models obtained from the other datasets. From this dataset,
the CART and CHAID algorithm obtained models that had the highest pre-
diction accuracy compared to all the other models obtained. Also from this
dataset, the C4.5 algorithm obtained the model with the second highest predic-

14
tion accuracy. What makes the OASIS-1 dataset distinguishable from OASIS-2
and OASIS-3 was that the age range of the participants was the largest and the
data in the dataset were acquired from a cross-sectional study. Having these
characteristics might have contributed to the reason the OASIS-1-U dataset pro-
duced models with better prediction accuracy than the models produced from
the other Unmoderated datasets. However, looking at it from another perspec-
tive, the amount of data samples of the OASIS-2-U dataset might have been too
little and the amount data samples of the OASIS-3-U dataset might have been
too much, to be able to produce favourable results. This could be the reason
why the Unmoderated variant of the OASIS-2 and OASIS-3 datasets produced
models with lower prediction accuracy than the models constructed from their
dataset variant counterparts, with only one exception. This exception was the
OASIS-2-RA dataset constructed using the CART algorithm.

One reason the OASIS-1-U dataset produced models with better prediction
accuracy compared to the other OASIS-1 datasets could be behind the way the
algorithms handle missing values using the OASIS-1 dataset. Another reason,
or a factor that could have played a big role in generating these results, relates
to the size of the data sample of the OASIS-1 dataset. As implicitly mentioned
in the previous paragraph, it is possible the size, when the OASIS-1 dataset was
Unmoderated, was favourable for the data it comprised of. After the removal of
missing values from the OASIS-1 dataset, the size of the data sample decreased
by more than half. Because of this, the OASIS-2-NMV and OASIS-2-RA had
more data samples than the OASIS-1-NVM and OASIS-1-RA. Less data after
the removal of missing values could have had a detrimental impact on the No
Missing Values and the Reduced Attribute variants of the OASIS-1 dataset. It
would, therefore, be interesting to see the size and the prediction accuracy of the
Decision Tree models constructed from a dataset acquired from a cross-section
study and has data sample around the same size as the OASIS-1-U dataset,
even after the removal of missing values.

However, it is also important to note regarding the OASIS-1 dataset that the
involved participants that were below the age of 62 years, when the data was
acquired, did not show signs of AD (Table 1). This could mean that the dataset
may produce models that perform poorly if it is tasked to predict unseen data
that contains participants under 62 years old that have AD.

Looking at the other datasets, OASIS-3-NMV and OASIS-3-RA datasets pro-


duced Decision Tree models with the most highest accuracy within the different
dataset variants. It seems that the OASIS-3 dataset is favourable when it does
not contain missing values or when the amount of data samples is decreased
from 2168. The OASIS-3 dataset has the largest data sample compared to the
other datasets in the experiment, even after the removal of data samples that
contain missing values. It also has the most amount of participants. Though
these datasets do not produce higher accuracy than the OASIS-1-U dataset,
the OASIS-3 dataset includes participants under 62 years old that have AD.
This could mean that the Decision Tree models built from the OASIS-3 dataset
generalizes better, as it most likely considers participants below the age of 62
years of having AD, too.

15
Furthermore, the CART algorithm, which the most favourable Decision Tree
model was constructed by, produced the most amount of models with the least
amount of nodes and shortest depth. Though the models created from the
CART algorithm are generally smaller, the results do not show that this was
at the expense of accuracy. The same can be said for the Reduced Attributes
datasets. Though the most favourable model was constructed from an Un-
moderated dataset variant, the Reduced Attributes dataset variant created the
smallest sized Decision Trees compared to the other dataset variants. And in
this case also, comparing with the datasets of the other variants, the results do
not indicate that the small number of nodes and the short depth of these models
were at the expense of accuracy.

6.3 Existing Literature


The results, obtained from the experiment conducted for this thesis, deviate
from the results obtained by Bansal et al. (2018). The authors applied the
OASIS-1 and OASIS-2 on the J48 implementation of the C4.5 algorithm. More-
over, the attribute reduction method used was the CFS Subset Evaluator ac-
cessible from the Weka software. The datasets, implementation of the C4.5
algorithm, and attribute reduction utilized by Bansal et al. were the ones uti-
lized in the experiment of this thesis, as well. However, the prediction accuracy
the authors achieved, after the attribute reduction method was executed on the
OASIS-2 dataset, was 98.66%. Utilizing the same tools, the prediction accuracy
achieved in the experiment of this thesis, however, was 80.23%.

The highest prediction accuracy achieved by Bansal et al. from the OASIS-
1 dataset was 99.52%, both before and after the application of the attribute
reduction method. The results from the experiment of this thesis achieved
88.30% on the OASIS-1 dataset before using the attribute reduction method
and 85.65% after the attribute reduction method. Both from the OASIS-1 and
OASIS-2 dataset on the C4.5 algorithm, Bansal et al. achieved a higher predic-
tion accuracy than the results of the experiment conducted for this thesis. The
reason behind the dissimilarity, between the results obtained by Bansal et al.
and the experiment of this thesis, however, is unclear.

In the study conducted by Farhan et al. (2014), the ensemble model was claimed
the best as it achieved the highest prediction accuracy with 93.75%. The De-
cision Tree model, on the other hand, performed the worst. It achieved the
lowest prediction accuracy on all cases but one. The exception was that the De-
cision Tree obtained the same lowest prediction accuracy as other models. The
methodology used in Farhan et al.’s study is different than the one proposed for
the experiment of this thesis. However, the highest prediction accuracy achieved
in the experiment of this thesis was 90.37%, which indicates that Decision Trees
have potential in the detection of AD.

16
7 Discussion
7.1 Limitations
One limitation worth mentioning involves the attributes of the datasets. As
mentioned under Section 4.2, the attributes in each dataset were able to be
expanded upon. However, the decision, as to which attributes to choose and
how many to choose, was arbitrary rather than educated. The only require-
ment was that the datasets comprised of demographic, clinical, and derived
brain anatomical information and that the amount of attributes utilized was
not excessive. Another limitation regarding the attributes utilized is that, dis-
regarding the datasets of the Reduced Attributes variant, the attributes in the
OASIS-3 datasets differed from OASIS-1 and OASIS-2 datasets. Studying how
different attributes affect the results is more appropriate if only one of the OA-
SIS datasets were utilized for this. Having more factors that differ between the
datasets can complicate the process of analyzing and comparing the datasets.

Lastly, another limitation involves the pruning method. The pruning method
differed between the Decision Tree algorithms. The analysis of the results do
not reveal differences that could have been caused from having different pruning
methods. However, it would have been more ideal had the same pruning method
been used across the three Decision Tree algorithms. This way, any difference
between the algorithms seen in the results can more directly be traced to the
algorithm.

7.2 Conclusion
This thesis was interested in studying the performance of Decision Tree algo-
rithms used to detect AD. Therefore, to gain insight on the validity of the
models constructed from these algorithms, different datasets that concern AD
were applied to different Decision Tree algorithms. As discussed in Section 4.4,
the proposed methodology is not meant to help arrive to a resolved conclusion.
Instead, it is meant to provide some knowledge, which is used to infer a theory.

The results from the experiment showed that the C4.5 algorithm and OASIS-2
dataset performed worse than their respective counterparts. In contrast, the
CART algorithm, and the OASIS-1-U, OASIS-3-NMV, OASIS-3-RA datasets
performed better than their respective counterparts. These results reveal which
datasets and algorithms are more appropriate than others in the context of the
experiment.

The results also revealed the worst and the best Decision Tree models. The
worst had a prediction accuracy of 76.94%, 55 number of nodes, and a depth
10. In contrast, the best had a prediction accuracy of 90.37%, 5 number of
nodes, and a depth 2. Though the method of the experiment was not designed
to address the full scope of the problem statement, the results obtained were able
to satisfy the curiosity that Decision Trees are valid and suitable for detecting
AD.

17
7.3 Future Work
The findings of this thesis indicate that Decision Trees are valid models to be
used for detecting AD. A future work could therefore involve detecting different
stages of AD rather than only detecting if AD is present or not.

Another idea for a future work is to examine the possibility of using images
rather than textual information to detect AD. The OASIS database provided
brain scan images, such as MRI and PET scans. This work could then in-
vestigate whether or not Decision Trees remain suitable to detect AD if brain
scanning images were utilized.

8 References
Alzheimer’s Association, 2021. What is Alzheimer’s Disease?.
https://www.alz.org/alzheimers-dementia/what-is-alzheimers (Accessed June 15,
2021).

Bansal, D., Chhikara, R., Khanna, K. and Gupta, P., 2018. Comparative anal-
ysis of various machine learning algorithms for detecting dementia. Procedia
computer science, 132, pp.1497-1502. Elsevier.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984. Clas-
sification and Regression Trees. CRC press.

Buckner, R.L., Head, D., Parker, J., Fotenos, A.F., Marcus, D., Morris, J.C.
and Snyder, A.Z., 2004. A unified approach for morphometric and functional
data analysis in young, old, and demented adults using automated atlas-based
head size normalization: reliability and validation against manual measurement
of total intracranial volume. Neuroimage, 23(2), pp.724-738.

Eibe Frank, Mark A. Hall, and Ian H. Witten, 2016. The WEKA Workbench.
Online Appendix for "Data Mining: Practical Machine Learning Tools and Tech-
niques", Morgan Kaufmann, 2016(4).

Farhan, S., Fahiem, M.A. and Tauseef, H., 2014. An ensemble-of-classifiers


based approach for early diagnosis of Alzheimer’s disease: classification using
structural features of brain images. Computational and mathematical methods
in medicine, 2014. Hindawi.

Folstein, M.F., Folstein, S.E. and McHugh, P.R., 1975. “Mini-mental state”:
a practical method for grading the cognitive state of patients for the clinician.
Journal of psychiatric research, 12(3), pp. 189-198.

Fotenos, A.F., Snyder, A.Z., Girton, L.E., Morris, J.C. and Buckner, R.L.,
2005. Normative estimates of cross-sectional and longitudinal brain volume de-
cline in aging and AD. Neurology, 64(6), pp.1032-1039.

Hollingshead, A.B., 1957. Two factor index of social position. Yale Univer-

18
sity Press, New Haven.

Ibarguren, I., Lasarguren, A., Pérez, J.M., Muguerza, J., Gurrutxaga, I. and
Arbelaitz, O., 2016. Bfpart: Best-first part. Information Sciences, 367, pp.
927-952. Elsevier.

Kass, G. V., 1980. An exploratory technique for investigating large quantities


of categorical data. Journal of the Royal Statistical Society: Series C (Applied
Statistics), 29(2), pp. 119-127.

LaMontagne, P. J., Benzinger, T. L. S., Morris, J. C., Keefe, S., Hornbeck, R.,
Xiong, C., Grant, E., Hassenstab, J., Moulder, K., Vlassenko, A. G., Raichle,
M. E., Cruchaga, C., and Marcus, D., 2019. OASIS-3: Longitudinal Neuroimag-
ing, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer Disease.
MedRxiv. Cold Spring Harbor Laboratory Press. doi: 10.1101/2019.12.13.19014902.

Laurent, H. and Rivest, R. L., 1976. Constructing optimal binary decision


trees is NP-complete. Information processing letters, 5(1), pp. 15-17.

Marcus, D. S., Fotenos, A. F., Csernansky, J. G., Morris, J. C., Buckner, R.


L., 2010. Open Access Series of Imaging Studies: Longitudinal MRI Data in
Nondemented and Demented Older Adults. Journal of cognitive neuroscience,
22(12), pp. 2677-2684. MIT Press. doi: 10.1162/jocn.2009.21407.

Marcus, D. S., Wang, T. H., Parker, J., Csernansky, J. G., Morris, J. C.,
Buckner, R. L., 2007. Open Access Series of Imaging Studies (OASIS): Cross-
sectional MRI Data in Young, Middle Aged, Nondemented, and Demented Older
Adults. Journal of cognitive neuroscience, 19(9). pp. 1498-1507. MIT Press.
doi: 10.1162/jocn.2007.19.9.1498.

Miah, Y., Prima, C.N.E., Seema, S.J., Mahmud, M. and Kaiser, M.S., 2021.
Performance comparison of machine learning techniques in identifying dementia
from open access clinical datasets. In Advances on Smart and Soft Computing,
pp. 79-89. Singapore: Springer.

Morris, J. C., 1993. The Clinical Dementia Rating (CDR): Current version
and scoring rules. Neurology, 43, pp. 2412–2414.

Naganandhini, S. and Shanmugavadivu, P., 2019. Effective diagnosis of alzheimer’s


disease using modified decision tree classifier. Procedia Computer Science, 165,
pp.548-555. Elsevier.

Quinlan, J. R., 1993. C4.5. Programs for Machine Learning. Morgan Kauf-
mann Publishers.

World Health Organization (WHO), 2016. The ICD-10 Classification of Mental


and Behavioural Disorders. Genève, Switzerland: World Health Organization.

19

You might also like