Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views7 pages

Model 2022

The document discusses the prediction of chronic kidney disease (CKD) using various machine learning models, emphasizing the importance of early detection due to the asymptomatic nature of the disease. It utilizes a dataset from the UCI repository and implements algorithms like Stochastic Gradient Descent, Random Forest, and Feed Forward Neural Networks to assess performance metrics such as Precision, F1-score, Recall, and Accuracy. The study aims to enhance clinical decision-making and improve patient outcomes through effective machine learning applications in healthcare.

Uploaded by

chowreddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Model 2022

The document discusses the prediction of chronic kidney disease (CKD) using various machine learning models, emphasizing the importance of early detection due to the asymptomatic nature of the disease. It utilizes a dataset from the UCI repository and implements algorithms like Stochastic Gradient Descent, Random Forest, and Feed Forward Neural Networks to assess performance metrics such as Precision, F1-score, Recall, and Accuracy. The study aims to enhance clinical decision-making and improve patient outcomes through effective machine learning applications in healthcare.

Uploaded by

chowreddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Prediction of Renal Illness using Machine Learning Models

CHANDRA SEKHAR SANABOINA 1


SRI SATYA PREM CHARAN KURELLA 2

University College Of Engineering, Kakinada


Department of Computer Science And Engineering
Kakinada - Andhra Pradesh - India
1
[email protected]
2
[email protected]

Abstract. The biggest issue that the entire globe may encounter at this time is chronic kidney disease.
Early stages are symptomless and only become apparent when kidney function has been reduced by up
to 25%. Therefore, it is necessary to anticipate and detect chronic renal disease. Due to their rapid and
precise detection capabilities, machine learning models are employed nearly exclusively in clinical and
medical settings to identify a variety of chronic conditions. Here Chronic kidney Disease dataset is used
from the UCI repository, several machine-learning algorithms are used in order to predict various chronic
diseases. The proposed system uses a Stochastic Gradient Descent algorithm to make our model learn a
lot faster. The expected results will be a comparative table for various machine learning algorithms with
respect to performance metrics like Precision, F1-score, Recall, and Accuracy.

Keywords: Chronic kidney disease, Stochastic Gradient Descent, Feed Forward Neural Networks, Ma-
chine Learning, Random Forest, Naive Bayes, Logistic Regression, Support Vector Machine, K Nearest
Neighbour.

(Received May 12th, 2022 / Accepted June 1st, 2023)

1 Introduction if undetected, can lead to permanent health problems


and even death. Machine learning is expected to pro-
A successful existence is closely related to being in ex- vide a low-cost detection and prediction solution for
cellent health. The body’s numerous organs interact this medical issue (12; 11). This may help doctors di-
with one another to function. The organs must be in agnose more patients more rapidly by allowing them to
good health in order to perform at their highest level. begin handling CKD patients at an early stage. symp-
Being in great health is important since it pertains to the tomless individuals are screened for CKD to enable ear-
condition of being socially, psychologically, and phys- lier therapeutic intervention and prevent improper ex-
ically well. Hence it is necessary to take care of every posure to nephrotoxic substances, both of which have
organ in our body to lead a healthy life. In contrast to the potential to considerably reduce the course of CKD
many other diseases, chronic kidney disease CKD may to end-stage renal disease. Machine learning can pre-
take a long time for its consequences to become appar- dict the occurrence, course, and determinants of indi-
ent in a patient. CKD is an asymptomatic disease at vidual chronic diseases in many contexts (13; 2). The
its earlier stage, with few or no symptoms, sometimes results are unique and relevant to improving clinical
without disease-specific symptoms, making it difficult decision-making and the organization of healthcare fa-
to predict, recognize and prevent. The sooner a dis- cilities. Machine learning accelerates data process-
ease is detected, the sooner treatment can begin, which, ing and analysis (10). With machine learning, predic-
INFOCOMP, v. 22, no. 1, p. pp-pp, June, 2023.
Saboina et al. Prediction of Renal Illness using Machine Learning Models 2
tive analytics algorithms can be trained on even larger tientsâ data were used in this study. It utilizes both
datasets and easily modified at deployment time to per- an online dataset provided by the Khulna City Medical
form deeper analysis and prediction of various chronic College’s real-time dataset as well as the UCI Machine
diseases. This study employs machine learning (ML) Learning Repository. After the data was trained with
methods to categorize and predict CKD. Healthcare fa- a 10-fold CV, random forest and ANN were employed
cilities, stakeholders, and experts will find it simpler to in this instance. The accuracy of the random forest ap-
identify and categorize patients as having CKD or not proach is 97.12%, whereas the accuracy of the ANN
as a result. is 94.5%. This strategy helps with the early diagnosis
of chronic renal disease prediction. P. Ghosh et al.(3)
2 Literature Review deals with handling the entire study and providing ex-
tremely accurate prediction results of CKD, Here Ad-
The kidneys will eliminate extra water and waste from aBoost, Gradient Boosting, Linear Discriminant Anal-
the blood. As the kidneys deteriorate waste builds up ysis, and Support Vector Machine have all been used.
which results in the development of symptomless dis- These algorithms are applied to the online dataset of the
ease. Laboratory testing can still diagnose a patient UCI machine learning repository. Results from Gradi-
even if they have no symptoms at all. Symptoms can ent Boosting GB Classifiers have a predictably high ac-
be treated with medications. Later stages may call curacy of about 99.80%. A. Vijayalakshmi et al.(14)
for mechanical hemofiltration dialysis or transplanta- used Machine Learning ML classification methods to
tion. A. A. Johari et al. (7) used two classification predict the value in a study to identify the presence or
algorithmsâtwo-class decision trees two-class neural absence of CKD in the patient. Several categorization
network algorithmsâto compare two algorithms for the systems can predict the patient’s CKD and non-CKD
anticipation of chronic renal disease. Out of these, status. This survey has covered different ML algorithms
the neural network outperformed the decision tree in used to identify renal disease as well as the key prob-
terms of accuracy 99.56%. J. D. De Guia et al. (8) lems, which are briefly addressed. The random Forest
deal to anticipate chronic kidney disease, the follow- ML algorithm has shown the best performance, with an
ing machine learning classifiers were employed in the order to maintain consistency of 99.75%. M. A. Is-
study: Random Forest, ANN, Naive Bayes, Decision lam et al.(6) used six algorithms Decision Tree, Ran-
Trees, SVM, and MLP. Out of the six algorithms, the dom Forest, Simple Logistic Regression, Naive Bayes,
ANN algorithm has the highest F1 score of 0.992248 Simple Linear Regression Model, and Linear Regres-
and the quickest training time of 46.999 ms. R. Gupta sion to predict the risk factor for CKD. The Random
et al.(5) performed a performance analysis on a num- Forest method yields a high accuracy of 98.8858%, ac-
ber of machine-learning methods for chronic renal dis- cording to an analysis of the findings. G. Nandhini et
ease prediction. Decision trees, logistic regression, and al.(9) primarily used various machine learning classi-
random forest are the algorithms. Among these, lo- fiers to provide a successful treatment for early disease
gistic regression, decision tree, and random forest all prediction. Ensemble classifiers, which combine the an-
attained accuracy levels of 98.48, 94.16, and 99.24, ticipated outcomes of various classifiers, help the model
respectively. I. U. Ekanayake et al.(1) Used 11 ma- perform even better. It used the four-ensemble algo-
chine learning classifiers to detect chronic kidney dis- rithm, which combines AdaBoost, Gradient Boosting,
ease in its early stages. RF, XGBoost, LR, SVM, Ad- Random Forest, and bagging. The effectiveness of these
aBoost, KNN, NEURAL NETWORK, GNB, DECI- classifiers was measured using a variety of metrics. In
SION TREE to support the early detection and treat- terms of accuracy, AdaBoost and Random Forest did
ment of patients in order to save their lives. Random better with 100% Accuracy.
forest outperformed the others, with a 99.85% accu-
racy rate. B. Gudeti et al.(4) made a research on the
3 Preliminaries
different algorithms and compare them using different
performance criteria. SVM, KNN, and Logistic Re- This section describes the preparations before building
gression were the models used in their research. The the model, including a description of the dataset, Oper-
Support Vector Machine outperformed them all with an ating environment, and metrics used for comparing the
accuracy of 99.25%. The advantage of this strategy is performance of various models
that the prediction procedure requires far less time, al-
lowing clinicians to start treating CKD patients as soon 3.1 Details of the data and the working environment
as possible. S. Y. Yashfi et al.,(15) proposed a tech-
nique for estimating the probability of CKD. 455 pa- The hospital data collection by Soundarapandian et
INFOCOMP, v. 22, no. 1, p. pp-pp, June, 2023.
Saboina et al. Prediction of Renal Illness using Machine Learning Models 3
al. in the UCI machine learning repository provided
the CKD dataset for this study. 400 samples make up Accuracy= (TP+TN)/(TP+TN+FP+FN) (1)
the dataset. The 24 predictors or features are classified
as 11 categorical and 13 Numeric values. The Dataset 3.3.2 Precision
consists of attributes like sugar, blood pressure, etc.
There are two classes for the output variable i.e., CKD The proportion of accurately predicted positive
for +ve symptoms and not CKD for -ve symptoms. Out observations to all anticipated positive observations
of 400 samples, 250 were classified as having CKD is determined by the precision score. It is shown in eq 2.
and 150 as not having CKD. There are a few missing
values in the data. The missing values of attributes are Precision= TP/(TP+FP) (2)
to be filled for better analysis.
3.3.3 Recall
3.2 Data processing
Recall/sensitivity is a ratio used to compare all
For ease, each nominal (categorical) variable is observations in a true class to precisely anticipated
coded and processed by a computer. Medical terms positive observations. It is shown in eq 3.
like PCC, ba, htn, dm, etc are in a categorical format
which are encoded as 0 or 1. Even though the three Recall=TP/(TP+TN) (3)
variables sg, al, and sc are categorical variables by def-
inition, their values are still numbers. Therefore, these 3.3.4 F1-Score
variables were treated as numbers.
The samples range from 1 to 400. When patients The weighted average of the Precision and Recall
did not consult the diagnostic center for which the calculations is the F1 score. It can be proven that
dataset may miss some medical diagnosis values. Since this score, which accounts for false positive and false
the number of samples is uncertain, an appropriate negative readings, is more valuable than accuracy. It is
imputation is required. The original CKD datasets shown in eq 4.
missing values were handled and filled once the
categorical variables were encoded. For filling in the F1-Score=(2(Recall*Precision))/(Recall+Precision)
missing values KNN imputation was used which works (4)
on the principle of choosing the nearest K samples and
selecting the one with the smallest Euclidean distance.
Samples of numeric type are filled with the median
4 Proposed Model
3.3 Performance Measure To diagnose the data samples in this section, several
machine-learning algorithms were used. The models
CKD and non-CKD were chosen to be positive and that performed the best among these were chosen as
negative in this study, respectively. A true positive prospective components. Their errors in judgment were
(TP) indicates that the diagnosis of the CKD sample analyzed, and the component models were identified.
was correct. False negatives (FN) indicate that CKD Next, a stochastic descent gradient model was applied
was misdiagnosed in samples. A false positive (FP) to produce better results.
indicates that the model failed to identify CKD. True
negative (TN) indicates that the Notckd probe’s diag- 4.1 Setting up and evaluating initial individual
nosis was correct. Precision, Accuracy F1Score, and models
recall were used to assess the model’s performance. On the entire CKD data sets, the corresponding subset
These are calculated using the following formulas: of features or predictions are applied, and the following
machine learning models were used with the goal of
3.3.1 Accuracy identifying CKD
1) Logistic regression model
Accuracy is one of the most straightforward metrics 2) Model based on trees: RF
to evaluate and is determined as the ratio of accurate 3) SVM, a decision-plane-based model
predictions to all other guesses. One way to put it is as 4) KNN, a distance-based model
shown in eq 1. 5) Model based on probabilities: NB
INFOCOMP, v. 22, no. 1, p. pp-pp, June, 2023.
Saboina et al. Prediction of Renal Illness using Machine Learning Models 4
6) Feed Forward Neural Network 4.3 Description of Algorithms

4.2 Establishing the stochastic gradient descent 4.3.1 Logistic Regression: In the category of
supervised learning techniques, logistic regression is
To identify model parameters that offer the best one of the most used machine learning algorithms. It
fit between expected and actual outputs, machine is utilized to forecast a categorical dependent variable
learning applications frequently use the stochastic using a certain set of independent variables. By using
gradient descent optimization process. This approach logistic regression, the output of a dependent variable
works but is unreliable. Stochastic gradient descent with a categorical component is predicted. The results
is widely used in the machine learning sector. As a must be discrete or categorical.
result, stochastic gradient descent chooses a subset of
the dataset at random for each iteration. The "batch" 4.3.2 Random Forest: Accurate prediction and
in gradient descent refers to the number of samples improved generalization are made possible by the use
from the dataset used to compute the gradient for each of random sampling and ensemble procedures in RF.
iteration. In a typical gradient descent optimization Many trees make up a random forest. The accuracy
like batch gradient descent, the batch is viewed as the increases with the number of decorrelation trees. Some
entire dataset. Making use of the entire dataset can be of the missing data can be filled in by a random forest
highly beneficial in reducing noise and unpredictability classifier.
to a minimum, however, issues occur as the dataset
expands. For example, your dataset has 1 million 4.3.3 Support vector machine: Cortes and Vapnik
samples. In order to use the gradient descent technique, developed the Support Vector Machine (SVM), a
he must do one iteration for every million samples technique for supervised machine learning. The goal
and continue it until the minimal value is attained. of SVM is to determine the best decision boundary
Therefore, the implementation requires a lot of com- with a maximum margin hyperplane between samples
puter resources. Utilizing stochastic gradient descent, of various classes. The SVM must convert the input
this issue is resolved. SGD only uses one probe. A data space dividing the dataset from a low-dimensional
single-stack operation for every iteration. Iterations space to a high-dimensional space into multiple sam-
are carried out by selecting and shuffling samples at ples with optimum boundaries.
random. In fig. 1, the architecture of the proposed
system is depicted. 4.3.4 K- Nearest Neighbour: Thomas Cover de-
veloped the K Nearest Neighbors (KNN) supervised
technique to solve classification and regression prob-
lems. To predict labels for newly provided points,
use the feature similarity technique. Additionally, this
implies that new test points are classified according to
the agreement of the training set’s K nearest neighbors,
where K is the number of neighbors.

4.3.5 Feed Forward Neural Network: The first and


most basic artificial neural network design was the
feedforward neural network. The information in this
network only travels in one direction, forward, from the
input nodes to the output nodes, passing via any hidden
nodes that may exist. The network doesn’t contain any
loops or cycles.

4.3.6 Naive Bayes: A classification algorithm using


Bayesian at its core is known as a naive Bayes classifier.
It’s a group of algorithms with related definitions, not a
single algorithm.
Figure 1: Architecture of Proposed system
4.3.7 Integrated Model: This model is formed by
INFOCOMP, v. 22, no. 1, p. pp-pp, June, 2023.
Saboina et al. Prediction of Renal Illness using Machine Learning Models 5
the combination of Logistic Regression and Random The comparison graphs for various metrics vs vari-
Forest using ensemble techniques. ous classifiers were shown in Figures (2), (3), (4), and
(5).
4.4 Encoding, Missing values, and Outlier Treat-
ment:
The label encoder will be used to convert the cat-
egorical columns’ values from categorical to numeric
after they have been imputed with KNN imputation.
When all of the columns in the complete data frame
have been converted to numeric columns, to impute the
missing values, we have used the multiple imputations
by chained equations (MICE) package. The interquar-
tile range will then be used to find outliers and should
be avoided to produce the final working dataset.
Figure 2: Accuracy graph of various ML models for CKD

4.5 Training and Testing: We split the dataset into


training 75% and test 25% sets in order to train and test
the CKD prediction model.

4.6 Model Prediction:


Several machine-learning approaches were em-
ployed to create a prediction model. The eight
techniques we employed includes Random Forest,
KNN, Logistic Regression, SVM, FFNN, Naive Bayes,
SGD, and Integrated Classifiers.

4.7 Model Comparison:


The model that performed the best in terms of
recall, F1-Score, accuracy, and precision must now be
chosen.
Figure 3: F1 score of various ML models for CKD
A comparative table of various ML models over the
performance metrics is shown in table (1).

Table 1: Comparative table of various ML models

M odel Accuracy Recall F 1Score P recision


RF 98.5 99 98 97
KN N 98.5 96 97 94
LR 99.5 99 99 99
NB 95.5 95 94 93
SGD 92.5 83 83 80
F F NN 98.5 98 98 97
IN G 99 98 99 97
SV M 99.5 99 99 99

Figure 4: Precision graph of various ML models for CKD


5 Experimental Results
For reaching high accuracy, positive and negative
characteristics are portrayed as being more crucial. The The Integrated Model’s achieved the second-highest
most effective classifiers in this research are SVM and accuracy result is 99%. The Random Forest, KNN, and
Logistic Regression with an accuracy score of 99.5%. Feed Forward Neural Networks classifiers and values
have the third-highest accuracy, at 98.5%. The naive
INFOCOMP, v. 22, no. 1, p. pp-pp, June, 2023.
Saboina et al. Prediction of Renal Illness using Machine Learning Models 6
C. B. H., dos Santos Alves, D. F., Carmona, E. V.,
Duran, E. C. M., and de Moraes Lopes, M. H. B.
Applications of digital and smart technologies to
control sars-cov-2 transmission, rapid diagnosis,
and monitoring. In Omics Approaches and Tech-
nologies in COVID-19, pages 405–425. Elsevier,
2023.

[3] Ghosh, P., Shamrat, F. J. M., Shultana, S., Afrin,


S., Anjum, A. A., and Khan, A. A. Optimization
of prediction method of chronic kidney disease
using machine learning algorithm. In 2020 15th
International Joint Symposium on Artificial Intel-
Figure 5: Recall graph of various ML models for CKD
ligence and Natural Language Processing (iSAI-
NLP), pages 1–6. IEEE, 2020.

Bayes classifier achieved a value accuracy of 95.5%. [4] Gudeti, B., Mishra, S., Malik, S., Fernandez,
We reach 92.5% accuracy with the Stochastic Gradient T. F., Tyagi, A. K., and Kumari, S. A novel
Descent SGD classifier, which is the second-lowest ac- approach to predict chronic kidney disease using
curacy. The research’s findings section aims to identify machine learning algorithms. In 2020 4th Interna-
each classifier’s best attempt. tional Conference on Electronics, Communication
and Aerospace Technology (ICECA), pages 1630–
1635. IEEE, 2020.
6 Conclusion
[5] Gupta, R., Koli, N., Mahor, N., and Tejashri, N.
Renal failure is the main cause of death in people with
Performance analysis of machine learning classi-
CKD. Chronic renal disease in general is a serious
fier for predicting chronic kidney disease. In 2020
issue for human health. Everyone should be concerned
International Conference for Emerging Technol-
about their health to avoid this at an early stage. We
ogy (INCET), pages 1–4. IEEE, 2020.
handled the missing data, trained it, and created models
for logistic regression and support vector machines. [6] Islam, M. A., Akter, S., Hossen, M. S., Keya,
These two algorithms were created in Python. The S. A., Tisha, S. A., and Hossain, S. Risk factor
accuracy we get using the Support Vector Machine and prediction of chronic kidney disease based on ma-
Logistic Regression algorithms are 99.5%, which is a chine learning algorithms. In 2020 3rd Interna-
comparatively high level of accuracy. tional Conference on Intelligent Sustainable Sys-
tems (ICISS), pages 952–957. IEEE, 2020.

[7] Johari, A. A., Abd Wahab, M. H., and Mustapha,


7 Future Scope
A. Two-class classification: Comparative exper-
The future direction of renal disease prediction systems iments for chronic kidney disease. In 2019 4th
using huge amounts of patient data can be accelerated International Conference on Information Systems
and made more accurate by using machine learning and Computer Networks (ISCON), pages 789–
techniques. 792. IEEE, 2019.

[8] Justin, D., Concepcion, R. S., Bandala, A. A., and


Dadios, E. P. Performance comparison of classi-
References
fication algorithms for diagnosing chronic kidney
[1] Ekanayake, I. U. and Herath, D. Chronic kidney disease. In 2019 IEEE 11th International Confer-
disease prediction using machine learning meth- ence on Humanoid, Nanotechnology, Information
ods. In 2020 Moratuwa Engineering Research Technology, Communication and Control, Envi-
Conference (MERCon), pages 260–265. IEEE, ronment, and Management (HNICEM), pages 1–
2020. 7. IEEE, 2019.

[2] Ferreira, D. D., Santos, L. O., Alvarenga, T. A., [9] Nandhini, G. and Aravinth, J. Chronic kidney
Rodríguez, D. Z., Barbosa, B. H. G., Ferreira, A. disease prediction using machine learning tech-
INFOCOMP, v. 22, no. 1, p. pp-pp, June, 2023.
Saboina et al. Prediction of Renal Illness using Machine Learning Models 7
niques. In 2021 International Conference on Re-
cent Trends on Electronics, Information, Commu-
nication & Technology (RTEICT), pages 227–232.
IEEE, 2021.

[10] Okey, O. D., Maidin, S. S., Lopes Rosa, R., Toor,


W. T., Carrillo Melgarejo, D., Wuttisittikulkij, L.,
Saadi, M., and Zegarra Rodríguez, D. Quantum
key distribution protocol selector based on ma-
chine learning for next-generation networks. Sus-
tainability, 14(23):15901, 2022.
[11] Okey, O. D., Melgarejo, D. C., Saadi, M., Rosa,
R. L., Kleinschmidt, J. H., and Rodríguez, D. Z.
Transfer learning approach to ids on cloud iot
devices using optimized cnn. IEEE Access,
11:1023–1038, 2023.

[12] Ribeiro, D. A., Melgarejo, D. C., Saadi, M., Rosa,


R. L., and Rodríguez, D. Z. A novel deep deter-
ministic policy gradient model applied to intelli-
gent transportation system security problems in 5g
and 6g network scenarios. Physical Communica-
tion, 56:101938, 2023.
[13] Teodoro, A. A., Silva, D. H., Rosa, R. L., Saadi,
M., Wuttisittikulkij, L., Mumtaz, R. A., and Ro-
driguez, D. Z. A skin cancer classification ap-
proach using gan and roi-based attention mecha-
nism. Journal of Signal Processing Systems, 95(2-
3):211–224, 2023.
[14] Vijayalakshmi, A. and Sumalatha, V. Survey
on diagnosis of chronic kidney disease usingma-
chine learning algorithms. In 2020 3rd Interna-
tional Conference on Intelligent Sustainable Sys-
tems (ICISS), pages 590–595. IEEE, 2020.
[15] Yashfi, S. Y., Islam, M. A., Sakib, N., Islam, T.,
Shahbaaz, M., Pantho, S. S., et al. Risk pre-
diction of chronic kidney disease using machine
learning algorithms. In 2020 11th International
Conference on Computing, Communication and
Networking Technologies (ICCCNT), pages 1–5.
IEEE, 2020.

INFOCOMP, v. 22, no. 1, p. pp-pp, June, 2023.

You might also like