A MACHINE LEARNING METHODOLOGY FOR DETECTING
CHRONIC
KIDNEY DISEASE
A PROJECT REPORT
Submitted by
KAVINKUMAR P 1807028
MANOJ G 1807030
SURYA P 1807052
In partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
COIMBATORE INSTITUTE OF TECHNOLOGY, COIMBATORE-641014
(Government Aided Autonomous Institution Affiliated to Anna University)
ANNA UNIVERSITY, CHENNAI 600025
ABSTRACT
Chronic kidney disease (CKD) is a global health problem with high morbidity
and mortality rate, and it induces other diseases. Since there are no obvious symptoms
during the early stages of CKD, patients often fail to notice the disease. Early
detection of CKD enables patients to receive timely treatment to ameliorate the
progression of this disease. Machine learning models can effectively aid clinicians
achieve this goal due to their fast and accurate recognition performance. In this study,
we propose a machine learning methodology for diagnosing CKD. The CKD data set
was obtained from the University of California Irvine (UCI) machine learning
repository, which has a large number of missing values. KNN imputation was used to
fill in the missing values, which selects several complete samples with the most
similar measurements to process the missing data for each incomplete sample.
Missing values are usually seen in real-life medical situations because patients may
miss some measurements for various reasons. After effectively filling out the
incomplete data set, Five machine learning algorithms (logistic regression, random
forest, support vector machine, k-nearest neighbor, naive Bayes classifier) were used
to establish models.
INTRODUCTION
CHRONIC kidney disease (CKD) is a global public health problem affecting
approximately 10% of the world’s population. The percentage of prevalence of CKD
in China is 10.8% , and the range of prevalence is 10%-15% in the United States.
According to another study, this percentage has reached 14.7% in the Mexican adult
general population. This disease is characterised by a slow deterioration in renal
function, which eventually causes a complete loss of renal function. CKD does not
show obvious symptoms in its early stages. Therefore, the disease may not be detected
until the kidney loses about 25% of its function . In addition, CKD has high morbidity
and mortality, with a global impact on the human body.
Machine learning refers to a computer program, which calculates and deduces
the information related to the task and obtains the characteristics of the corresponding
pattern . This technology can achieve accurate and economical diagnoses of diseases;
hence, it might be a promising method for diagnosing CKD.
We used KNN imputation to fill in the missing values in the data set, which
could be applied to the data set with the diagnostic categories are unknown.Logistic
regression (LOG), RF, SVM, KNN, naive Bayes classifier (NB) were used to establish
CKD diagnostic models on the complete CKD data sets. The models with better
performance were extracted for misjudgment analysis.
KNN imputation is used to fill in the missing values. To our knowledge, this is
the first time that KNN imputation has been used for the diagnosis of CKD. In
addition, building an integrated model is also a good way to improve the performance
of separate individual models. The proposed methodology might effectively deal with
the scene where patients are missing certain measurements before being diagnosed.
LITERATURE SURVEY
PAPER 1]
AUTHOR AND JOURNAL :
GUOZHEN CHEN , CHENGUANG DING , YANG LI , XIAOJUN HU , XIAO LI , LI REN
IEEE
TITLE :
Prediction of Chronic Kidney Disease Using Adaptive Hybridized Deep
Convolutional Neural Network on the Internet of Medical Things Platform.
METHODOLOGY :
Adaptive hybridized Deep Convolutional Neural Network (AHDCNN)
CNN
Internet of medical things platform (IoMT)
DRAWBACKS :
MISSING VALUES ARE FILLED BY MEDIAN METHOD WHICH IS NOT PROMISING
PAPER 2]
AUTHOR AND JOURNAL :
ALVARO SOBRINHO, ANDRESSA C. M. DA S. QUEIROZ , MARIA ELIETE
PINHEIRO , AND ANGELO PERKUSICH
IEEE
TITLE :
Computer-Aided Diagnosis of Chronic Kidney Disease in Developing Countries:
A Comparative Analysis of Machine Learning Techniques
METHODOLOGY :
k-fold cross-validation method based on the Weka software
J48 DECISION TREE
DRAWBACKS :
ACCURACY IS ONLY ABOUT 95%
LIMITED VALUES IN DATASET
PAPER 3]
AUTHOR AND JOURNAL :
ERLEND HODNELAND,
EIRIK KEILEGAVLEN,
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
TITLE :
Detection of CKD using Tissue deformation fields from dynamic MR imaging
METHODOLOGY :
Image registration method
USES DYNAMIC MR IMAGING
DRAWBACKS :
DATASET IS VERY SMALL
Paper 4]
AUTHOR AND JOURNAL :
AHMED AMIJOHARI,MOHD
HELMY Abd WAHAB
IEEE International Conference on Information Systems and Computer
Networks
TITLE :
Two Class Classification comparative experiments for CKD
METHODOLOGY :
Two class Decision Forest && Two class Neural Networks
DRAWBACKS :
NO OF SAMPLES IS LESS
STAGE OF THE DISEASE CAN’T BE POINTED
Paper 5]
AUTHOR AND JOURNAL :
BILAL KHAN , RASHID NASEEM , FAZAL MUHAMMAD , GHULAM ABBAS,
SUNGHWAN KIM
IEEE
TITLE :
An Empirical Evaluation of Machine Learning Techniques for Chronic Kidney
Disease Prophecy
METHODOLOGY :
Seven ML techniques together with NBTree, J48 are used.
(MAE), (RMSE), (RAE), (RRSE), recall
IMPROVEMENTS :
Severity of the disease can’t be predicted
PAPER 6]
AUTHOR AND JOURNAL :
Navaneeth Bhaskar and Suchetha M
IEEE
TITLE :
A Deep Learning-based System for Automated Sensing of Chronic Kidney
Disease
METHODOLOGY :
CNN-SVM integrated network
1-D Deep Learning convolution network
Used Saliva Samples
IMPROVEMENTS :
Only one Parameter is taken for prediction which decreases the prediction
capacity
PAPER 7]
AUTHOR AND JOURNAL :
Shubham Vashisth ,Ishika Dhall
International Conference on Cloud Computing
TITLE :
Chronic Kidney Disease (CKD) Diagnosis using Multi-
Layer Perceptron Classifier
METHODOLOGY :
Multi-Layer Perceptron Classifier
CMS data set is used
IMPROVEMENTS :
The dataset size can be increased
Accuracy can be increased
PAPAER 8]
AUTHOR AND JOURNAL :
N V Ganapathi Raju,
K Prasanna Lakshmi,
K. Gayathri Praharshitha
Chittampalli Likhitha
2019 ICICCS
TITLE :
Prediction of
chronic
kidney
disease (CKD)
using Data
Science
METHODOLOGY :
SVM, Random Forest,
XGBoost, Logistic
Regression, Neural
networks, Naive Bayes
Classifier.
IMPROVEMENTS :
The missing values can be filled using KNN.
Accuracy can be increased
PAPAER 9]
AUTHOR AND JOURNAL :
Yedilkhan Amirgaliyev , Shahriar Shamiluulu,
Azamat Serek
IEEE
TITLE :
Analysis of CKD using Machine Learning Techniques
METHODOLOGY :
Used Support Vector Machine(SVM)
IMPROVEMENTS :
Accuracy is only about 93%
An Integrated Model can be proposed
PAPER 10]
AUTHOR AND JOURNAL :
Gunarathne W.H.S.D, Perera K.D.M , Kahandawaarachchi K.A.D.C.P
IEEE International Conference on Bioinformatics and Bioengineering
TITLE :
Performance Evaluation on Machine Learning Classification Techniques and
Forecasting through Data Analytics for Chronic Kidney Disease (CKD)
METHODOLOGY :
Uses Multi cast Decision Tree Classifier
IMPROVEMENTS :
Accuracy is about 99.1%
They reduced the dataset size to 15 attributes
PAPER 11]
AUTHOR AND JOURNAL :
Ahmed J. Aljaaf, Dhiya Al-Jumeily, Hussein M. Haglan
IEEE Congress on Evolutionary Computation (CEC)
TITLE :
Early Prediction of Chronic Kidney Disease Using Machine Learning Supported
by Predictive Analytics
METHODOLOGY :
The Classification and Regression Tree.i.e.
RPART
Two Black box models SVM and MLP
IMPROVEMENTS :
No information about any kind of medications has been collected with this
data
PAPER 12]
AUTHOR AND JOURNAL :
Abdullah Al Imran, Md Nur Amin , Fatema Tuj Johora
International Conference on Innovation in Engineering and Technology (ICIET
TITLE :
Classification of Chronic Kidney Disease using Logistic Regression, Feedforward
Neural Network and Wide & Deep Learning
METHODOLOGY :
Logistic regression, feedforward neural networks and wide & deep learning to
diagnose CKD
IMPROVEMENTS :
They simply removed the datasets containing missing values(KNN can be used)
PAPER 13]
AUTHOR AND JOURNAL :
Hanyu Zhang, Che-Lun Hung, William Cheng-Chung Chu,Ping-Fang Chiu§ and
Chuan Yi Tang
IEEE International Conference on Bioinformatics and Biomedicine
TITLE :
Chronic Kidney Disease Survival Prediction with Artificial Neural Networks
METHODOLOGY :
Artificial Neural Network (ANN) models while applying to the survivability
prediction on Chronic Kidney Disease (CKD) patients.
IMPROVEMENTS :
They indicated that the dataset is much imbalanced(KNN can be used)
PAPER 14]
AUTHOR AND JOURNAL :
K.Shankar,P. Manickam, G. Devika,M. Ilayaraja
IEEE International conference on computational and computing research
TITLE :
Optimal Feature Selection for Chronic Kidney Disease Classification using Deep
Learning Classifier
METHODOLOGY :
Ant Lion Optimization (ALO) technique to choose optimal features for the
classification process.
Deep Neural Network (DNN)
IMPROVEMENTS :
Data mining procedures can be utilized as a part of training with enhancing
execution of classifiers, and the datasets are expanded