Chronic Kidney Disease Prediction using
Machine Learning
Kottam Poojitha, Methuku Divyasri, Garlapad Akshara
Department of Information Technology, G. Narayanamma Institute of Technology and Science (for
Women), Hyderabad, India
Email: [insert emails here]
Abstract
Chronic Kidney Disease (CKD) is a life-threatening condition that often goes undiagnosed in its early
stages due to the absence of noticeable symptoms. Early prediction and intervention are crucial to prevent
progression and improve patient outcomes. This study develops a predictive model using machine learning
techniques and ensemble methods to classify individuals as CKD-positive or CKD-negative based on
clinical features. The ensemble approach, combining Random Forest, XGBoost, and CatBoost,
demonstrated high accuracy and robustness.
Index Terms
Chronic Kidney Disease, Machine Learning, Ensemble Learning, Classification, Health Prediction
I. INTRODUCTION
Chronic Kidney Disease (CKD) is a major global health problem. Often asymptomatic in early stages,
CKD is linked to hypertension, diabetes, and obesity. Machine Learning (ML) offers a promising solution
for early CKD detection, enabling timely interventions. This project proposes a predictive model based on
ensemble techniques to improve prediction accuracy.
II. LITERATURE SURVEY
Prior research has utilized individual classifiers and ensemble models. Notably, studies leveraging
XGBoost and CatBoost report high accuracy. This project builds upon such approaches, integrating various
ML models and interpreting results for practical deployment.
III. METHODOLOGY
The model training involves preprocessing the UCI CKD dataset, handling missing values, encoding
categorical features, and applying feature selection. We used ensemble techniques—Random Forest,
XGBoost, and CatBoost—with a voting classifier to enhance prediction performance.
IV. SYSTEM DESIGN
The system is a web-based Streamlit application that takes user inputs and predicts CKD probability. The
architecture includes user authentication, model backend, and integration with a language model to provide
personalized health advice.
V. RESULTS AND DISCUSSION
The CatBoost classifier achieved the highest accuracy (98%), followed by Random Forest (97%).
Ensemble methods outperformed single classifiers, proving their effectiveness in CKD prediction.
Confusion matrices validated the model’s precision and recall.
VI. CONCLUSION AND FUTURE WORK
This project successfully demonstrated that ensemble learning significantly improves CKD prediction.
Future work includes adding explainable AI (XAI), mobile deployment, and extending the system to
predict multiple chronic conditions.
REFERENCES
1. [1] D.A. Debal, T.M. Sitote, 'Chronic kidney disease prediction using machine learning techniques', J
Big Data, 2022.
2. [2] G.M. Ifraz et al., 'Comparative analysis for prediction of kidney disease using intelligent machine
learning methods', Computat Math Methods Med, 2021.
3. [3] B. Divya et al., 'CKD prediction using machine learning algorithms', IJEDR, 2022.
4. [4] A.J. Aljaaf et al., 'Early prediction of CKD using ML', IEEE CEC, 2018.
5. [5] M. Almasoud, T.E. Ward, 'Detection of CKD using ML with few predictors', Int J Soft Comput
Appl, 2019.