Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
43 views5 pages

Research Paper

Uploaded by

Kundan Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views5 pages

Research Paper

Uploaded by

Kundan Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Parkinson’s Disease Detection System

Using Machine Learning Techniques


Harshit Mishra Himanshu Yadav
Department of IT Department of IT
KIET Group of Institutions Delhi NCR, India KIET Group of Institution Delhi NCR, India
[email protected] [email protected]

Kundan Verma Ajay Kumar


Department of IT Department of IT
KIET Group of Institution Delhi NCR, India KIET Group of Institution Delhi NCR, India
[email protected] [email protected]

Abstract—”Parkinson’s disease is a chronic neurological disor- of many of its symptoms and improvement of patients’ life
der that significantly affects the motor functions of a person and quality as well.
has greatly impacted the quality of life due to debilitating symp-
Parkinson’s disease was once that well-known neurological
toms, yet complex etiologies involving genetic and environmental
factors. This paper considers the application of machine learning disease, which refers affection of motion, and, in this context,
techniques for predicting Parkinson’s disease in a voice dataset. a patient may experience bad posture, shaky hands, stiffness
We evaluated eight machine learning models, namely: Linear in muscles, hallucinations, and a host of other structures. It is
Regression, Logistic Regression, Decision Tree, Support Vector a widespread misconception that the early detection of disease
Machine (SVM), Random Forest, XGBoost, Neural Networks,
and AdaBoost. The challenge of class imbalance has been
cannot be practiced in young people because it happens mostly
addressed with the Self-Organizing Map Undersampling (SOM- in elderly groups. Medical statistics have battery increased
US) technique. Evaluation of the models was based on Recall, manifold, the quantity supplied is from other clinical areas
Precision, and F1 score. Our results indicate that the XGBoost such as health care services. The need for these statistics is to
and Random Forest classifiers performed best, with XGBoost handle them and to achieve insights from them through a Big
attaining a high F1 score value of 0.64, which shows a robust
balance between precision and recall. The study indicates that
Data analysis technique via Machine Learning approach that is
machine learning can be applied for the improvement of diagnosis supposed for handling various medical and clinical problems.
at an early stage and intervention in Parkinson’s disease and It is seen from earlier research that some popular models of
suggests that further research involving expanded datasets and the algorithms of ML receive the full treatment with higher
other types of data is required.” overall performance in classification.
Furthermore, machine learning techniques have been ap-
I. INTRODUCTION plied in a number of prior research to predict not just Parkin-
son’s disease but also other diseases including diabetes[3] and
Parkinson’s disease is a long-term degenerative neurolog- heart disease [4]. The goal of this research is to apply eight
ical disorder that mainly disturbs the control of movements. machine learning approaches to predict Parkinson’s disease.
The disease is mostly defined by a set of key features like The study’s contributions are as follows:
hand tremors, slowing movement, and muscle rigidity. These
symptoms are caused by the damage from this section of the • It developed eight machine learning models to predict
brain, the substantia nigra, which is a motor center that runs on Parkinson’s disease.
dopamine. RehumanizeIn addition to the motor effects, Parkin- • It assessed each model’s performance using three perfor-
son’s disease can increase the number of non-motor symptoms mance metrics (Recall, Precision, and F1 score) across a
which include reduced cognitive function, mood changes, and dataset of Parkinson’s disease cases.
an altered autonomic nervous system that negatively impacts • To address the issue of class imbalance in the Parkinson’s
the quality of life in your diagnosed individuals [1]. Despite disease data set, the study employs a new method called
the unknown mechanisms that it comes from, it is a complex SOM-US [5].
interplay of genetics and the environment that scientists think The organisation of the remaining portions of the research
is the underlying reason [2]. Disregarding that it is, in fact, the are include in Sections II to Section VII. The relevant work is
most widespread of all neurodegenerative ailments, up to now presented in Section II, the proposed method, machine learning
an uncurable medical malady, Parkinson’s disease is. Various techniques, data set, and performance measure are described in
therapies such as drugs, surgical interventions, and supportive Section III, the experimental design is included in Section IV,
therapies are the ones mostly being applied in the therapy the suggested findings and discussion are presented in Section
V, and the research is concluded in Section VI with future into SVM, KNN, DT, RF, and MLP classifiers. Better
work listed in Section VII. results were obtained, with RF combined with t-SNE
getting 97% and MLP combined with PCA getting 98%
II. RELEVENT WORKS
accuracy. The results show that acoustic signals have
A number of noteworthy research have made substantial proven to effectively and automatically detect PD, likely
contributions to the field of Parkinson’s disease diagnosis to allow for early diagnosis and treatment to improve the
using machine learning (ML) methods. lives of patients. This study adds to current knowledge
with its new approach for early detection of PD through
• Mei J et al [6]. States that the first comprehensive voice analysis.
review of all research on the application of ML to the
diagnosis of PD. It provides a high-level overview of • Shaban, M [9]. States that this paper reviews the
the studies included: ML methods and their outcomes techniques of machine and deep learning applied
in PD diagnosis, clinical behavioral and biometric data to research in Parkinson’s disease from 2016 to
types that accurately pinpoint the disease, potential 2022, emphasizing neural networks due to their high
biomarkers for clinical decision-making, and relevant performance and automated feature extraction. Key data
databases that can be used to expand smaller sets. modalities include sensory, handwriting, and EEG data.
The review will depict the immense potential of Most studies used binary classification (to differentiate
ML-driven PD diagnosis in advancing a systematic PD patients from controls), and a few addressed PD
clinical decision-making process. Adoption of novel staging or early biomarker identification. An important
biomarkers can lead to earlier and more accessible limitation is that dataset sizes are small, with variability
PD diagnoses. General principles: ML approaches in diagnoses among clinicians, and datasets are mostly
show a promising potential to provide clinicians with from patients with advanced PD, thus hampering early
additional tools for screening, detection, and diagnosis of diagnosis efforts. Future studies should be performed
PD, potentially resulting in improved patient outcomes on clinically relevant datasets, including DaTscan, and
through earlier intervention and more accurate diagnoses. should try other modalities, such as sleep EEG, to
help in the early biomarker identification and better
• Aditi & Sushila [7]. States that the research examines management of PD. This paper points out the necessity
the classification of Parkinson’s disease using vowel of larger and more diverse datasets, and more advanced
phonation data with an accuracy of 91.835% and a methods, to achieve the early diagnosis and management
sensitivity of 0.95 using a Random Forest classifier. of PD.
The Random Forest model is optimal since there is
almost balanced importance across the 22 attributes • Chatterjee et al [10]. States that early detection
in the MDVP dataset. Similarly, the SVM model with of Parkinson’s disease is of prime importance for
PCA applied is able to achieve 91.836% accuracy and understanding its etiology and enabling early treatment.
a sensitivity of 0.94. All the best models are robust In this paper, an ensemble-based PDD-ET model, using
against outliers and predict no false positives. The KNN premotor features, is proposed to discriminate between
model also performs well given the balanced data. The healthy individuals and those with PD. The PDD-ET
study suggests the Random Forest model in classifying model has reached high accuracy of 95.325%, far better
PD based on simplicity, accuracy, and non-invasive than those of 14 different machine learning and deep
nature. Future enhancements will involve adding more learning models. This represents a superior performance
data of audio and REM sleep data, as audio data alone credited to the conglomeration of various machine
is not sufficient for PD classification. These results learning and deep learning techniques, which enhanced
advocate for the use of mobile-recorded audio data for its ability for PD diagnosis. The experimental results
PD diagnosis via telemedicine with the aim of long-term clearly show the effectiveness of the PDD-ET model
patient relief. in early diagnosis of PD and point out its potential
for improving patients’ outcomes by administering
• Alalayah et al [8]. States that Parkinson’s disease, caused treatments at an early stage of the disease.
by a deficiency of dopamine, is hard to diagnose due
to its unclear symptoms. Machine learning has helped
III. RESEARCH METHODS
with the early diagnosis of the disease from voice
disorders. In that direction, the present study has made a A. Proposed Method
contribution by offering a dataset of 22 features of voice, This study proposes a model for Parkinson’s disease predic-
which were preprocessed to remove outliers and were tion and validates the proposed method with machine learning
further ranked using the Recursive Feature Elimination techniques. The study makes use of eight machine learning
algorithm. Dimensionality reduction was done using approaches on Parkinson’s disease data set, three performance
t-SNE and PCA, and the resulting features were fed measures (F1 score, recall, and precision), and a SOM-US, a
metrics: recall, precision, and F1 score.
We go over these performance measures below:

Recall (Sensitivity) shows how good the model is at finding


the real cases of Parkinson’s disease – it’s vital for making sure
we don’t miss anyone who needs help.
True Positive(TP)
Recall =
True Positive(TP) + False Negative(FN)

Precision tells us how accurate the model is at predicting


the disease, by showing how many of its guesses were actually
right.
True Positive(TP)
Precision =
True Positive(TP) + False Positive(FP)
Fig. 1. Proposed Methodology

recent technology that addresses the issue of class imbalance F1-score gives us a single number that balances recall and
in Parkinson’s disease data sets. Figure 1 depicts the general precision, helping us see the overall performance of the model.
overview of the suggested technique. Precision × Recall
F1 Score& = 2 ×
B. Machine Learning Techniques Precision + Recall
In our research on identifying Parkinson’s disease, we used
eight different machine learning techniques to create a strong
predictive model. These techniques were Linear Regression, IV. EXPERIMENTAL DESIGN
Logistic Regression, Decision Tree, Support Vector Machine
To validate the proposed approach, an experimental study
(SVM), Random Forest, XGBoost, Neural Networks, and
has been conducted using 8 machine learning techniques, 3
AdaBoost. Each method has its own strengths in dealing with
performance measures, and one Parkinson’s Disease data set.
complex medical data, contributing to the overall accuracy and
An overview of the experimental design is shown in Fig. 2:
reliability of the prediction model.
C. Data Set
1) Brief Description: We have gathered a large body of
biological speech data from thirty-one subjects, twenty-three
of whom are PD patients. The study’s data set was obtained
from kaggle [11].There are 195 voice recordings in the dataset;
each recording has a row in the table corresponding to the
name of the individual and columns with various voice metrics.
Our primary goal in using this data is to distinguish between
individuals with Parkinson’s disease (PD) and those who are
healthy, as indicated by the ”status” column, which indicates
if the individual has PD (1) or is healthy (0).
2) Pre-processing: One of the biggest difficulties we en-
countered was handling class imbalance, which is a common
issue in medical datasets where the number of healthy samples
is usually higher than the number of disease samples. To
address this, we used a recent technique called Self-Organizing Fig. 2. Experimental Design
Map Undersampling (SOM US) [5].This method balance the
dataset by reducing the dominance of the majority class while The experimental design also takes into consideration the
maintaining the integrity of the minority class, which improved problem of imbalanced data in the detection of Parkinson’s
the model’s ability to accurately identify Parkinson’s disease. disease. We used the Self-Organizing Map Under-Sampling
technique to pre-process the data in order to obtain a balanced
D. Performance Measures number of both classes. This step is important to reduce
This study evaluates eight machine learning methods bias, given the fact that there is an imbalanced distribution
for Parkinson’s disease prediction using three performance of positive and negative instances in the dataset. These
processed data were then thoroughly tested with several VII. FUTURE WORKS
machine learning algorithms to determine the most accurate • Future studies can be directed toward the integration of
model for Parkinson’s disease detection. We targeted the best other data types, such as genetic and imaging data, for
model, on the basis of three performance measures - Recall, the improvement of model accuracy and robustness.
Precision, and F1 score.
• Further, increasing the size of the dataset and integrating
longitudinal data will help to elucidate the nature
V. RESULTS & DISCUSSION of disease progression and improve early diagnosis
capabilities. Continued development in machine learning
In our comparative analysis of machine learning models for and data processing techniques will be central to the
predicting Parkinson’s disease, we found that the XGBoost development of better diagnostic tools for Parkinson’s
and Random Forest classifiers provided the best performance. disease and other intractable medical conditions.
Table 1 displays the outcomes of eight machine learning
models according to three performance measures: • Additionally, implementing more data sets can improve
the accuracy of Parkinson’s disease prediction.
S.No Algorithms Recall Precision F1-score
1 Linear Regression 0.55 0.30 0.39 REFERENCES
2 Logistic Regression 0.02 0.2 0.05 [1] Ali, L., Javeed, A., Noor, A. et al. ”Parkinson’s disease
3 Decision Tree 0.5 0.48 0.49
4 Support Vector Machine 0.02 0.2 0.05 detection based on features refinement through L1 regularized
5 Random Forest 0.5 0.89 0.64 SVM and deep neural network”. Sci Rep, vol. 14, no. 1, pp.
6 XGBoost 0.52 0.81 0.64 1333 (2024). https://doi.org/10.1038/s41598-024-51600-y
7 Neural Network 0.41 0.77 0.52
8 AdaBoost 0.28.0 0.48 0.15
[2] Alshammri, R., Alharbi, G., Alharbi, E., &
TABLE I
PERFORMANCE MEASURES Almubark, I. ”Machine learning approaches to identify
Parkinson’s disease using voice signal features. Frontiers
in Artificial Intelligence”, vol. 6 pp. 4001 2023.
The models provided valuable insights and high perfor- https://doi.org/10.3389/frai.2023.1084001
mance in detecting Parkinson’s disease. The performance of
XGBoost is slightly better than the Random Forest classifier. [3] A. Kumar and K. Kaur, “A novel MCDM-based
The XGBoost model has a high F1 score of 0.64, indicat- framework to recommend machine learning techniques for
ing a good balance between precision and recall. XGBoost diabetes prediction,” Int. J. Eng. Technol. Innov., vol. 14, no.
outperforms Random Forest, as shown by Recall and F1 1, pp. 29-43, 2024.
score. However, it has a slightly lower precision than Random
FOrest; that is, it is better at capturing true positives, although [4] A. Kumar, A. K. Singh, and A. Garg, “Evaluation
less precise at the same time. The Random Forest was better of machine learning techniques for heart disease prediction
in precision but suffered in terms of recall. using multi-criteria decision making,” J. Intell. Fuzzy Syst.,
vol. 46, no. 1, pp. 1259-1273, 2024.
VI. CONCLUSION [5] A. Kumar, “SOM-US: A novel under-sampling
technique for handling class imbalance problem,” J. Commun.
This study demonstrated that the eight applied models
Softw. Syst., vol. 20, no. 1, pp. 69–75, 2024.
— Linear Regression, Logistic Regression, Decision Tree,
Support Vector Machine (SVM), Random Forest, XGBoost,
[6] Mei J, Desrosiers C, Frasnelli J. ”Machine Learning
Neural Networks, and AdaBoost, were designed to find pat-
for the Diagnosis of Parkinson’s Disease: A Review of
terns in the voice recordings of individuals diagnosed with
Literature. Front Aging Neurosci”, vol. 13, pp. 633752.
and without Parkinson’s disease. One of the critical problems
doi: 10.3389/fnagi.2021.633752. PMID: 34025389; PMCID:
related to the studied dataset was its class imbalance. This
PMC8134676.
issue was dealt with by applying the Self-Organizing Map
Undersampling (SOM-US) technique.
[7] Aditi and Sushila: ”Explored the early detection of
The evaluation measures included Recall, Precision, and F1 Parkinson’s disease using ML techniques in telemedicine,
score as essential measures. These measures can be used to achieving a detection accuracy of 91.83% with the Random
evaluate the Parkinson’s disease prediction accuracy of these Forest classifier”, vol. 218, pp. 249-261, 2023.
models. Our results show that XGBoost and Random Forest
presented the best performance, with XGBoost yielding a [8] Alalayah, K. M., Senan, E. M., Atlam, H. F., Ahmed,
slightly higher F1 score and recall, while Random Forest had I. A., & Shatnawi, H. S. A. Automatic and early detection
better precision. of Parkinson’s disease by analyzing acoustic signals using
classification algorithms based on recursive feature elimination
method. Diagnostics (Basel, Switzerland), vol. 13, no. 11, pp.
1924 2023. https://doi.org/10.3390/diagnostics13111924

[9] Shaban, M. ”Deep learning for Parkinson’s disease


diagnosis: A short survey”. Computers, vol. 12, no. 3, pp. 58,
2023. https://doi.org/10.3390/computers12030058

[10] Chatterjee, Kalyan, Ramagiri Praveen Kumar,


Anjan Bandyopadhyay, Sujata Swain, Saurav Mallik,
Aimin Li, and Kanad Ray. ”PDD-ET: Parkinson’s Disease
Detection Using ML Ensemble Techniques and Customized
Big Dataset” Information, vol. 14, no. 9, pp. 502 2023.
https://doi.org/10.3390/info14090502

[11] Ukani, V. (2020). Parkinson’s Disease Data Set [Data


set].

You might also like