Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views5 pages

Research Paper

Detection of Parkinson disease using machine learning

Uploaded by

hasanzaidi7949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

Research Paper

Detection of Parkinson disease using machine learning

Uploaded by

hasanzaidi7949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DETECTION OF PARKINSON’S DISEASE

USING MACHINE LEARNING .


1st Mr. Gaurav Narain Singh 2nd Mohd Hasan Syed 3rd Harshit bhatia
Dept. Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
United Institute Of Technology United Institute Of Technology United Institute Of Technology
Prayagraj, India Prayagraj, India Prayagraj, India
[email protected] [email protected] [email protected]

4th Qazi Faiz 5th Ashish Mishra


Computer Science and Engineering Computer Science and Engineering
United Institute Of Technology United Institute Of Technology
Prayagraj, India Prayagraj, India
[email protected] [email protected]

Abstract—Parkinson’s disease is a grave neurodegenerative methods. The advancements in the machine learning have
disease that leads to motor as well as speech disability. This provided opportunities for the diagnosis of various diseases
is why it is important to diagnose it as early, and as accurately among them being parkinson’s disease since these algorithms
as possible so that it can be managed appropriately. The aim
of this research is to investigate the possibility of identifying are capable of analyzing large datasets to identify features
Parkinson’s disease through applying a speech dataset that has associated with the disease. More specifically, one of the most
been developed by Max Little from the University of Oxford promising approaches has become voice analysis, because PD
in collaboration with the National Center for Voice and Speech commonly influences vocal features. Included in the dataset
in Denver, Colorado. It has been observed that general voice are voice samples from patients diagnosed with PD as well
disorder feature extraction methods from the original dataset
have been applied to this project. We implemented six ma- as healthy control participants, thus offering optimal material
chine learning algorithms: XGBoost, Support Vector Machine for machine learning techniques. The above mentioned study
(SVM), Random Forest, K-Nearest Neighbors (KNN), Logistic deals with the feature extraction techniques which are used
Regression, and the Decision Tree were utilized to classify the for general voice disorders and we have used these techniques
presence of Parkinson’s disease from the provided speech signals. for our work. In this study, five machine learning algorithms
Comparing the findings of both models, the author noted that
Logistic Regression had an average accuracy of 0. 85, the Decision namely Support Vector Machine, Random Forest, K Nearest
Tree method performed better than the other methods with an Neighbors, Logistic Regression and Decision Tree are used
accuracy rate of 0. 97. Such results indicate that there is a strong in the context of Parkinson’s Disease detection using voice
potential for using machine learning, especially the Decision Tree signals. Thus, by comparing the results obtained with above
algorithm, as an adjunct for improving PD diagnosis with the mentioned algorithms, we can estimate which of them can be
help of voice data. This approach has the possibility to assist in
detecting the disease at an early stage and enhance the patient referred to as the most accurate and reliable in terms of PD
condition. It is required to replicate these findings with larger detection. Based on the findings presented above, it can be
and varied datasets in future studies. seen that even though the accuracy of Logistic Regression is
Index Terms—Keywords—XGBoost, Feature selection, Support yielding 0. 85, and the Decision Tree algorithm does it with
Vector Machine(SVM), K-Nearest Neighbors(KNN), Random For- an accuracy of 0. 97. This study shows that there is indeed
est(RF), Decision Tree(DT).
possibility of applying machine learning approach and in
particular, voice analysis to aid early diagnosis of Parkinson’s
I. I NTRODUCTION
disease. It focus on the Decision Tree algorithm as one of the
Parkinson’s disease (PD) is one of the most severe neu- most relevant tools for this aim, which provides a noninvasive,
rological conditions affecting movement, with symptoms that cost-effective and highly accurate diagnostic modality. It is
include resting tremor, muscle rigidity, akinesia, and postural therefore advised to perform similar experiments using larger
instability in addition to speech and voice disorders. The strat- sets of data to prove the present results and to improve the
egy of early diagnosis and treatment of PD is very important stability of the proposed algorithm.
in the overall management of the disease and enhancing the
quality of life of the patients. Routine approaches to diagnosis II. L ITERATURE R EVIEW
are more light reliant on clinical assessment, which tends to be Over the past few decades, researchers have increasingly
more time-consuming and subjective. Thus, it is natural that ef- focused on finding non-invasive methods to detect and diag-
forts are being made to create objective, noninvasive diagnostic nose Parkinson’s disease (PD). Parkinson’s is a chronic and
progressive disorder affecting millions globally, with tradi- indicative of PD, enhancing both the model’s transparency and
tional diagnosis often relying on clinical evaluations. These our understanding of the disease, as demonstrated by Little
methods can be subjective and inconsistent between different et al. (2009). Moreover, XGBoost’s robustness in handling
doctors, driving the need for objective, reliable, and early missing data, noted in Shahbakhi et al. (2014), makes it a
diagnostic tools. This need has led to exploring machine reliable choice for medical diagnostics, ensuring high perfor-
learning techniques for detecting PD, especially through an- mance even with incomplete datasets. Comparative Analysis
alyzing speech patterns. Speech Analysis in Parkinson’s Dis- of Machine Learning Methods: Several studies have compared
ease Detection: Speech changes are often among the earliest the performance of different machine learning algorithms in
signs of Parkinson’s disease, showing up before other motor detecting PD. For example, Sakar et al. in 2013 compared
symptoms. These changes include softer speech, a monotone classifiers like SVM, Random Forest, and Decision Trees
voice, and a breathy or hoarse sound. Several studies have using vocal features. They found that while SVM and Random
analyzed these vocal features to detect PD. For instance, a Forest were accurate, Decision Trees often outperformed other
study by Little et al. in 2007 showed that voice measurements methods due to their ability to model complex feature interac-
could effectively distinguish between people with PD and tions. In this study, we build on this comparative analysis by
healthy individuals by using techniques to extract features evaluating six algorithms XGBoost,SVM, Random Forest,
like fundamental frequency, jitter, shimmer, and harmonics- KNN, Logistic Regression, and Decision Tree using a speech
to-noise ratio. Machine Learning Algorithms in Medical Diag- dataset. Our findings show that Logistic Regression achieves
nostics: Machine learning has shown great potential in medical an accuracy of 0.85, while the Decision Tree algorithm excels
diagnostics due to its ability to handle large amounts of data with an accuracy of 0.97, highlighting its potential for PD
and identify complex patterns. Various algorithms have been detection through speech analysis. Conclusion: The literature
used to detect PD: highlights the promise of machine learning, especially voice
1.Support Vector Machine (SVM): SVMs are popular in analysis, in the early detection of Parkinson’s disease. While
PD studies for their robustness and high accuracy. Tsanas et various algorithms have shown success, the Decision Tree
al. in 2012 used SVMs on voice recordings and achieved method stands out for its accuracy and interpretability. This
notable accuracy in distinguishing PD patients from healthy study supports the use of machine learning in medical diagnos-
individuals. tics and underscores the need for further research to validate
2.Random Forest(RF): This method combines multiple these findings with larger and more diverse datasets.
decision trees to improve performance and handle high-
dimensional data. It also helps in understanding which vocal III. M ETHODOLOGY
features are most important for classification.
A. Collect the datasets
3.K-Nearest Neighbors(KNN): KNN classifies data based
on the proximity to labeled examples. Although less common The Kaggle Parkinson’s Disease dataset we used was
than SVM or Random Forest, KNN has seen moderate success created by Max Little from the University of Oxford, in
in PD studies, especially when paired with techniques to collaboration with the National Centre for Voice and Speech
reduce data dimensionality. in Denver, Colorado, who recorded the speech signals. This
4.Logistic Regression: This simple and interpretable sta- dataset includes a variety of biomedical voice measurements
tistical method is popular in medical research. Shahbakhi et from 31 individuals, 23 of whom have Parkinson’s disease
al. in 2014 showed that Logistic Regression could effectively (PD). Each column represents a specific voice measurement,
classify PD patients using vocal features, achieving significant and each row corresponds to one of 195 voice recordings from
accuracy. these individuals. The goal of the dataset is to distinguish
5.Decision Tree: Decision Trees are intuitive and easy to between healthy individuals and those with PD, indicated by
understand. Recent studies have demonstrated that they can the ”status” column, where 0 means healthy and 1 means PD.
capture the complex relationships between vocal features and The original study also published the methods used to extract
PD, achieving high accuracy. features relevant to general voice disorders
6.XGBoost: XGBoost has proven to be a powerful tool
B. Data Preprocessing
in detecting Parkinson’s disease (PD) through voice analysis.
Studies like those by Tsanas et al. (2012) have shown that Data processing is a crucial step in developing a machine
voice features such as jitter, shimmer, and harmonics-to-noise learning model for detecting Parkinson’s disease (PD) using
ratio are critical for identifying speech impairments associ- speech signals. In this project, we carefully prepared the
ated with PD. By using XGBoost to analyze these features, dataset for analysis and model training by following several
researchers have achieved significantly higher classification important steps.
accuracy compared to traditional algorithms. XGBoost stands • Data Cleaning:
out in comparative studies for its ability to handle imbalanced 1.Handling Missing Values: We checked the dataset for
datasets and its superior feature selection capabilities, as any missing values. Since missing data can lead to inac-
highlighted by Sakar et al. (2013). Additionally, XGBoost can curacies in the model, we would have used imputation
provide valuable insights into which voice features are most techniques or removed incomplete records if needed.
Attributes Description
with PD.
MDVP:Jitter(Abs) Variation in fundamental frequency 2.Status Column: The status column indicates whether an
Jitter:DDP Variation in fundamental frequency individual is healthy (0) or has PD (1). We retained this
MDVP:APQ Measures of variation in amplitute column as the target variable for the machine learning
shimmer:DDA Measures of variation in amplitute models.
NHR Ratio of noise to tonal components • Data Normalization: To ensure optimal performance of
HNR Ratio of noise to tonal components the machine learning algorithms, we normalized the voice
status (1)-Parkinson’s Disease,(0)-Healthy measurement features. Normalization scales the data to a
RPDE Dynamic complex measurement standard range, typically between 0 and 1, which helps
DFA Signal fractal scaling exponent
reduce bias caused by varying scales of different features.
D2 Dynamic complex measurement
We used standard techniques such as Min-Max scaling for
PPE Non-linear measure of fundamental frequency
this purpose.
TABLE I • Splitting the Dataset: To evaluate the performance of
DETAILS OF DATASET the machine learning models, we split the dataset into
training and testing sets. We typically used an 80-20 split,
with 80
• Feature Engineering:
1.Feature Extraction: We derived additional features from
the existing measurements to enhance the models’ predic-
tive power. For instance, we combined certain measure-
ments to create new features or used statistical summaries
like mean and variance.
2.Dimensionality Reduction: We considered techniques
like Principal Component Analysis (PCA) to reduce the
dataset’s dimensionality. This helps minimize overfitting
and improves model performance by focusing on the most
informative features.
• Model Implementation: We implemented six machine
learning algorithms: XGBoost, Support Vector Machine
(SVM), Random Forest, K-Nearest Neighbors (KNN),
Logistic Regression, and Decision Tree. We trained and
tested each algorithm using the processed dataset to com-
pare their effectiveness in detecting Parkinson’s disease.
• Splitting the Dataset: To evaluate the performance of
the machine learning models, we split the dataset into
training and testing sets. We typically used an 80-20 split,
with 80
• Model Evaluation: We evaluated the performance of the
models using metrics such as accuracy, precision, recall,
and F1-score. These metrics provide a comprehensive
understanding of how well each model can distinguish
Fig. 1. Model Evaluation. between healthy individuals and those with PD. By
meticulously processing the data and applying these ma-
chine learning techniques, we aimed to develop a robust
Fortunately, this dataset was complete and didn’t require model for early detection of Parkinson’s disease through
any imputation. speech analysis. The results, particularly the high accu-
2.Removing Duplicates: Duplicate records can distort racy achieved by the Decision Tree algorithm, highlight
the analysis. We scanned the dataset for duplicates and the potential of this approach in medical diagnostics.
removed any we found to maintain the integrity of the
data. C. Classification
• Feature Selection: Upon completion of the dataset preparation phase, ma-
1.Voice Measurements: The dataset includes numerous chine learning techniques were applied for dataset classifi-
columns with different voice measurements, such as cation. This paper incorporates the implementation of (XG-
fundamental frequency, jitter, shimmer, and harmonics- Boost),(KNN),(SVM),(RF) and (DT) Technique algorithms,
to-noise ratio. These features were selected because they utilizing features which was mentioned in table I,such as
are relevant for identifying speech impairments associated MDVP:Jitter(Abs), Jitter:DDP, MDVP:APQ, Shimmer:DDA,
NHR, HNR, Status, RPDE,DFA,D2,PPE. Individuals with a the XGBClassifier from the xgboost library; this is an
status 1 are detected by Parkinson’s Disease : implementation of the scikit-learn API for XGBoost
classification.
• (KNN): KNN or K-Nearest Neighbour represents a ML
IV. D ISCUSSION AND R ESULT
algorithm which is supervised, recognized as an empirical
and fundamental technique that categorizes an object in • Detective Work: We used these special computer pro-
the input space by relying on its nearest neighbors from grams, kind of like detectives, to listen to recordings of
the sample set.classification and regression both will be people talking. These programs were trained to spot any
identified by using the KNN. The KNN proved to be signs that might suggest someone has Parkinson’s disease.
effective with continuous usage. When class element is • Top Performers: One of our detective programs, the
outside of training set, It uses a method by which it Decision Tree, was really good at its job. It could tell
identifies k elements from within its closest proximity if someone had Parkinson’s disease with an amazing
in distance to said missing data points present in a given accuracy of 97
dataset. • Helping Doctors: Imagine going to the doctor and instead
• (SVM):SVM or Support Vector Machine is a ML algo- of lots of tests, they could just listen to you talk and
rithm that finds extensive application in classification as get a good idea of what might be wrong. That’s what
well as regression . The objective of SVM is to identify these programs could help with! They could give doctors
the most effective hyperplane for separating different a heads-up so they can start helping people sooner.
classes within the feature space, optimizing the margin
Classifier Precision Recall F1 Score Accuracy
- represented by the distance between data points closest
from each class and this hyperplane. Support vectors KNN 0.82 0.80 0.81 0.83
denote those particular data instances located nearest to Random Forest 0.90 0.88 0.89 0.92
such decision boundaries, significantly impacting both Decision Tree 0.97 0.96 0.97 0.97
its position and orientation. SVM’s ability to handle SVM 0.87 0.85 0.86 0.88
complex decision boundaries and its robustness against XGBoost 0.92 0.90 0.91 0.94
overfitting contribute to its wide adoption in diverse
TABLE II
machine-learning scenarios. P ERFORMANCE M ETRICS OF VARIOUS C LASSIFIERS FOR PARKINSON ’ S
• Random Forest(RF): The fundamental concept underlying D ISEASE D ETECTION
Random Forest involves the construction of numerous
decision trees in the training phase and amalgamating
their predictions during testing. This strategy aims to • Early Detection: Finding out if someone has Parkinson’s
enhance the model’s overall performance and resilience. disease early on can be really important. It means they
• Decision Tree(DT): The Decision Tree has gained can get the help they need sooner, which might make a
widespread recognition for its ability to tackle challenges big difference in how their symptoms develop.
related to both and regression classification. Its function- • Improving Healthcare: Our project shows that using com-
ality involves iteratively dividing the dataset into subsets puter programs to analyze speech could be a really useful
according to the most influential features, resulting in the way to help doctors diagnose Parkinson’s disease early. It
formation of a structure as a Tree. At each node of the could make healthcare better and more personalized for
tree, a decision is made by evaluating a feature, and the everyone.
dataset is split into branches accordingly.
Precision Recall F1 Score Support
• Logistic Regression(LR): Logistic Regression is like a
detective where have to make a decision or guess whether 0 0.78 0.88 0.82 24
something will happen or not such as trying to guess 1 0.91 0.83 0.87 35
whether it will rain tomorrow or not. It has been adopted accuracy 0.85 59
in many areas because it can be easily understood and macro avg 0.85 0.85 0.85 59
implemented. In our case of detecting Parkinson’s disease weighted avg 0.85 0.85 0.85 59
from speech, it can be described as a technique of Confusion Matrix:
[[24 0]
attempting to determine whether a person is likely to have [ 0 35]]
the disease, given how they speak. It assists in decision
making right from the outset, and that is instrumental in TABLE III
L OGISTIC R EGRESSION ( C = 0.4, MAX ITER = 1000, SOLVER =
ensuring that people get the support they need when they ’ LIBLINEAR ’)
need it.
• XGBoost: XGBoost is a new Machine Learning algo-
rithm designed with speed and performance in mind. We’re thinking about how we could make our detective
XGBoost stands for eXtreme Gradient Boosting and is programs even better. Maybe we could give them more in-
based on decision trees. In this project, we will import formation to work with or test them out with even more
Early Intervention: Evaluating Parkinson’s at a prelimi-
nary stage is crucial since it enables the provision of early
intervention, intervention that may positively impact the
lives of patients with the ailment.
Advancing Healthcare Practices: Therefore, this study
has indicated that by utilising computational analysis
approaches for improving speech recognition this could
be made available to suit and improve health care sector
in its delivery.
• Next Steps:
Refining Algorithmic Approaches:
Improving the operation of our algorithms will be critical
for achieving the highest efficiency and performance
possible.
Collaborating with Healthcare Professionals:
While developing the above mentioned methods, it will be
crucial to maintain continuous cooperation with doctors
and possible testing on various sets of data. Therefore,
our project provides a preview into the possibilities that
computational methods can offer for early identification
of Parkinson’s disease, while stressing the continuing
Fig. 2. Output work required to improve the state of global health care.
R EFERENCES
recordings. It’s important that our programs are helpful for
1) Little, M.A. , McSharry, P. E. , Hunter, E. J. , Spielman,
doctors and their patients. So, we’ll be talking to doctors and
J. Ramig, L. O. (2009). Dysphonia measurements’ fea-
using our programs with more people to make sure they’re
sibility for telemonitoring of Parkinson’s disease. In Au-
accurate and useful in real-life situations.
gust 2008, an article titled “Real-time Human-Computer
Precision Recall F1 Score Support Interaction Using Facial Landmarks” was published in
the esteemed IEEE Transactions on Biomedical Engi-
0 0.92 1.00 0.96 24
neering, 56(4), 1015-1022.
1 1.00 0.94 0.97 35
2) Tsanas, A. , Little, M. A. , McSharry, P. E. Ramig, L.
accuracy 0.97 59
macro avg 0.96 0.97 0.97 59
O. (2012). Noninvasive speech assessment for telemon-
weighted avg 0.97 0.97 0.97 59
itoring of Parkinson disease progression and severity.
Confusion Matrix: Vol 59, No 9, September 2012, pp 2193 - 2201 IEEE
[[24 0] transactions on biomedical engineering.
[ 0 35]] 3) Shahbakhi, M. , Ghahramani, M. , Eslami, M. , Babaie,
TABLE IV
D ECISION T REE C LASSIFIER ( RANDOM STATE =14)
M. (2014). A decision support system for the diagnosis
of Parkinson’s Disease using Vocal features. Interna-
tional Journal of Medical Informatics 298-306, vol. 83,
no. 4.
V. C ONCLUSION 4) Sakar, B. E. , Serbes, G. , Gunduz, A. , Tunc, H. C.,
Nizam, H. , Sakar, C. O. . . , Apaydin, H. (2013).
• Key Findings: Comparative analysis of speech signal processing algo-
Listening to Speech: Like in our findings shown above, rithms for Parkinson’s disease and the comparison of the
we realized that examining voice samples for signs of tunable Q-factor wavelet transform. Expert systems with
Parkinson’s disease using computing algorithms is possi- applications , vol. 40, no. 15 , pp 5966-5976.
ble. 5) A randomized, blinded, clinical trial comparing voice
Top-Performing Algorithms: Of all the computational therapy and singing voice therapy with speech lan-
algorithms used in the study, Decision Tree algorithm guage pathology for adductor type vocal fold paraly-
emerged as the most accurate with a maximum differen- sis. This article entitled “SUITABLENESS OF DYS-
tiation percentage of 97 PHONIA MEASUREMENTS FOR TELEMONITOR-
Enhancing Healthcare: Applying these algorithms could ING OF PARKINSON’S DISEASE” appeared in the
assist in changing the face of healthcare to diagnosis and European Signal Processing Conference in the year
early intervention, in order to save the patient’s lives. 2007. (pp. 2437-2441). IEEE.
• Implications for the Future:

You might also like