0% found this document useful (0 votes)

17 views4 pages

Predicting Diabetes Onset Using Machine Learning

Diabetes prediction

Uploaded by

priyapaul8078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Predicting Diabetes Onset Using Machine Learning

Diabetes prediction

Uploaded by

priyapaul8078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Predicting Diabetes Onset Using Machine Learning

1. Introduction

Diabetes is a chronic and often debilitating disease impacting millions worldwide. The condition
arises when blood glucose levels become too high, which over time can lead to severe complications
such as heart disease, kidney failure, and vision loss. Early detection and intervention are crucial to
managing diabetes effectively, potentially preventing the onset of complications and improving
patient quality of life.

The goal of this project is to develop a machine-learning model that can predict the onset of diabetes
using a dataset of patient health markers. By leveraging statistical and machine-learning techniques,
this model can analyze relevant factors such as age, blood pressure, body mass index (BMI), and
glucose levels to identify individuals at risk of developing diabetes.

2. Objectives

• Primary Objective: Build a predictive model to estimate the likelihood of diabetes onset
using patient health data.

• Secondary Objectives:

o Identify the health markers that are most strongly associated with the risk of
diabetes.

o Evaluate and compare the effectiveness of different machine-learning algorithms.

o Validate the model’s performance using appropriate evaluation metrics.

3. Data Collection and Preparation

3.1 Data Source

For this project, we used the PIMA Indian Diabetes Dataset, a publicly available dataset containing
medical records of female patients aged 21 and above from the Pima Indian population. This dataset
includes both diabetic and non-diabetic patients, with health features typically linked to diabetes
risk.

3.2 Features in the Dataset

The dataset comprises several key features associated with diabetes, including:

• Pregnancies: Number of times pregnant

• Glucose: Plasma glucose concentration in an oral glucose tolerance test

• Blood Pressure: Diastolic blood pressure (mm Hg)

• Skin Thickness: Triceps skinfold thickness (mm)

• Insulin: 2-Hour serum insulin (mu U/ml)

• BMI: Body Mass Index (weight in kg/(height in m)^2)

• Diabetes Pedigree Function: Likelihood of diabetes based on family history

• Age: Age of the patient

3.3 Data Preprocessing

1. Handling Missing Values: Missing values in features like glucose, BMI, and blood pressure
were replaced using mean imputation for simplicity. Alternatively, more complex imputation
methods, such as K-Nearest Neighbors (KNN) imputation, can be used to improve accuracy.

2. Feature Scaling: Since the dataset contains features with different scales, we applied
standardization to normalize the data, ensuring that features like BMI and age contribute
equally to the model.

3. Splitting the Data: The dataset was divided into training and testing sets, typically with a 70-
30 split to allow for model evaluation on unseen data.

4. Model Selection and Development

Several machine-learning algorithms were tested to determine which would yield the best results for
diabetes prediction:

1. Logistic Regression: Suitable for binary classification problems, logistic regression provides a
probabilistic approach to predicting diabetes risk.

2. Decision Tree: A tree-based model that classifies data points based on the most predictive
features, offering high interpretability.

3. Random Forest: An ensemble method that builds multiple decision trees and averages their
predictions, improving model accuracy and reducing overfitting.

4. Support Vector Machine (SVM): A powerful classifier that finds an optimal boundary to
separate data points into classes. A radial basis function (RBF) kernel was used for non-linear
classification.

5. Gradient Boosting Classifier: An ensemble method that combines weak learners to improve
overall accuracy, often yielding high performance for classification tasks.

4.1 Model Training

Each model was trained using the training set, with hyperparameters optimized through Grid Search
Cross-Validation. This allowed us to identify the optimal parameters for each model, balancing bias
and variance for improved generalizability.

5. Model Evaluation

To evaluate model performance, we used several metrics suitable for binary classification:

• Accuracy: Proportion of correct predictions among total predictions.

• Precision: Proportion of true positives among all positive predictions.

• Recall (Sensitivity): Proportion of true positives identified by the model.

• F1 Score: The harmonic mean of precision and recall, providing a balanced measure of the
model’s performance.

• Area Under the Curve (AUC-ROC): Measures the model’s ability to distinguish between
classes. Higher AUC values indicate better performance.
Model Accuracy Precision Recall F1 Score AUC-ROC

Logistic Regression 78.2% 76.5% 73.0% 74.7% 0.81

Decision Tree 75.8% 73.1% 71.5% 72.3% 0.75

Random Forest 83.4% 80.5% 78.2% 79.3% 0.87

Support Vector Machine 81.0% 78.3% 76.0% 77.1% 0.84

Gradient Boosting 85.2% 82.0% 79.8% 80.9% 0.89

Best Model Selection

The Gradient Boosting Classifier showed the highest accuracy, F1 Score, and AUC-ROC, indicating it
as the most effective model for predicting diabetes onset in this dataset.

6. Feature Importance

For interpretability, we analyzed the features contributing most to diabetes prediction. Features with
the highest importance scores included:

1. Glucose Levels: Strongest predictor, as high blood glucose levels are directly related to
diabetes risk.

2. BMI: Obesity is a known risk factor for diabetes, making BMI a crucial indicator.

3. Age: Risk of diabetes increases with age.

4. Diabetes Pedigree Function: A family history of diabetes elevates risk, reflected in this score.

7. Conclusion and Future Work

The results indicate that machine-learning models can effectively predict diabetes onset using
patient health data, with the Gradient Boosting Classifier demonstrating superior performance. This
predictive capability can enable healthcare providers to identify at-risk individuals early, offering a
proactive approach to diabetes management.

Future Enhancements

• Expanding Feature Set: Incorporate additional relevant health markers like cholesterol levels,
dietary habits, and physical activity levels.

• Real-World Validation: Test the model on more diverse datasets to improve generalizability
across populations.

• Integration in Clinical Settings: Develop a user-friendly tool or app for clinicians to use,
integrating this model into electronic health records (EHR) for real-time predictions.

8. References

• National Institute of Diabetes and Digestive and Kidney Diseases. (2023). Diabetes Overview.
Retrieved from https://www.niddk.nih.gov/

• World Health Organization. (2022). Global Report on Diabetes. Retrieved from

https://www.who.int/

(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
100% (8)
(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
51 pages
ppt715B.pptm (Autosaved)
No ratings yet
ppt715B.pptm (Autosaved)
15 pages
Risab
No ratings yet
Risab
13 pages
Project Report
No ratings yet
Project Report
10 pages
Diabe PDF
No ratings yet
Diabe PDF
11 pages
Machine Learning and Deep Learning Techniques
No ratings yet
Machine Learning and Deep Learning Techniques
13 pages
CIEA Term Project
No ratings yet
CIEA Term Project
19 pages
DIABETIES
No ratings yet
DIABETIES
3 pages
Diabetes Prediction Model Report
No ratings yet
Diabetes Prediction Model Report
3 pages
Final Seminar Report Soumya
No ratings yet
Final Seminar Report Soumya
20 pages
Diabetes Prediction via ML Models
No ratings yet
Diabetes Prediction via ML Models
9 pages
Classification
No ratings yet
Classification
9 pages
Diabetes - Test Report
No ratings yet
Diabetes - Test Report
62 pages
Diabetes Prediction Using ML Techniques
No ratings yet
Diabetes Prediction Using ML Techniques
18 pages
ZEROTHREVIEW
No ratings yet
ZEROTHREVIEW
10 pages
Diabetes Detection with ML
No ratings yet
Diabetes Detection with ML
10 pages
Prediction of Diabetes Disease Using An Ensemble of Machine Learning Multi-Classifier Models
No ratings yet
Prediction of Diabetes Disease Using An Ensemble of Machine Learning Multi-Classifier Models
24 pages
MLPPT 11 45
No ratings yet
MLPPT 11 45
31 pages
Food Del Report 1
No ratings yet
Food Del Report 1
13 pages
Machine Learning and Applications CS522I1C
No ratings yet
Machine Learning and Applications CS522I1C
15 pages
Seetu Papers 1
No ratings yet
Seetu Papers 1
6 pages
Batch 15
No ratings yet
Batch 15
2 pages
Diabetes Analysis and Prediction
No ratings yet
Diabetes Analysis and Prediction
45 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
1 page
Peerj Cs 1914
No ratings yet
Peerj Cs 1914
30 pages
Camera Ready Paper-Anushree
No ratings yet
Camera Ready Paper-Anushree
12 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Aiml Project Report
No ratings yet
Aiml Project Report
10 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
2 pages
Machine Learning Meets Healthcare: Predicting Diabetes Onset With EHR
No ratings yet
Machine Learning Meets Healthcare: Predicting Diabetes Onset With EHR
8 pages
Diabetes Prediction Model Chapters
No ratings yet
Diabetes Prediction Model Chapters
3 pages
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
No ratings yet
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
10 pages
Poster Template
No ratings yet
Poster Template
1 page
DSPYProject Report
No ratings yet
DSPYProject Report
14 pages
A Comparative Analysis Using Machine Learning Algorithm On
No ratings yet
A Comparative Analysis Using Machine Learning Algorithm On
19 pages
IPL Winning Prediction Intern Report
No ratings yet
IPL Winning Prediction Intern Report
52 pages
Final
No ratings yet
Final
44 pages
B13 Poster (Final)
No ratings yet
B13 Poster (Final)
1 page
Ai Datascience Project Grade 10
No ratings yet
Ai Datascience Project Grade 10
14 pages
Report 4227
No ratings yet
Report 4227
29 pages
Diabetes ML Synopsis
No ratings yet
Diabetes ML Synopsis
5 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
ML - Mini Project Diabetic Prediction
No ratings yet
ML - Mini Project Diabetic Prediction
13 pages
Early Diabetic Detection via ML
No ratings yet
Early Diabetic Detection via ML
11 pages
Diabetes Prediction - ML
No ratings yet
Diabetes Prediction - ML
29 pages
c20 Final Final
No ratings yet
c20 Final Final
21 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
13 pages
DPS
No ratings yet
DPS
18 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Final Survey Diabetes Prediction ML IEEE
No ratings yet
Final Survey Diabetes Prediction ML IEEE
5 pages
241410
No ratings yet
241410
10 pages
Diabetes Decoded: Transitioning From Traditional Models To Hybrid Deep Learning Approaches
No ratings yet
Diabetes Decoded: Transitioning From Traditional Models To Hybrid Deep Learning Approaches
5 pages
Early Detection of Diabetes Using Logistic Regression Risk Factor Analysis and Probabilistic Prediction
No ratings yet
Early Detection of Diabetes Using Logistic Regression Risk Factor Analysis and Probabilistic Prediction
12 pages
Project Report Minor
No ratings yet
Project Report Minor
33 pages
Intro To Ai and ML Microsoft
No ratings yet
Intro To Ai and ML Microsoft
19 pages
IJISAE Bhagyashree Pathak
No ratings yet
IJISAE Bhagyashree Pathak
23 pages
Prediction of Heart Disease Using Machine Learning and Hybrid Methods
No ratings yet
Prediction of Heart Disease Using Machine Learning and Hybrid Methods
7 pages
Tree Vs LSTM For SCM
No ratings yet
Tree Vs LSTM For SCM
17 pages
Machine Learning MCQ 1000
No ratings yet
Machine Learning MCQ 1000
229 pages
Project Viva Notes
No ratings yet
Project Viva Notes
23 pages
10 11648 J Ajcst 20220503 11
No ratings yet
10 11648 J Ajcst 20220503 11
10 pages
Acd 21 JB
No ratings yet
Acd 21 JB
51 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
33 pages
Applied Machine Learning Midterm
100% (1)
Applied Machine Learning Midterm
6 pages
Leveraging Machine Learning For Predicting Mental Health Outcomes A Data-Driven Approach
No ratings yet
Leveraging Machine Learning For Predicting Mental Health Outcomes A Data-Driven Approach
9 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Software Defect Prediction Using Ensemble Learning
No ratings yet
Software Defect Prediction Using Ensemble Learning
6 pages
Applsci 15 05930
No ratings yet
Applsci 15 05930
29 pages
Machine Learning for Network Intrusion Detection
No ratings yet
Machine Learning for Network Intrusion Detection
7 pages
Predictive Analytics A Review of Trends and Techni
No ratings yet
Predictive Analytics A Review of Trends and Techni
8 pages
Multimodal ML Approach
No ratings yet
Multimodal ML Approach
16 pages
Bmabmagraphpmp Pdf#bmabmagraphpmp
No ratings yet
Bmabmagraphpmp Pdf#bmabmagraphpmp
6 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
SSRN 5076059
No ratings yet
SSRN 5076059
15 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
Prof. K. Rajan
No ratings yet
Prof. K. Rajan
65 pages
Data Modification and Predictive Analytics - MCQ - 1 - 2
No ratings yet
Data Modification and Predictive Analytics - MCQ - 1 - 2
24 pages
Machine Learning Ensembles For Wind Power Prediction: Version of Record
No ratings yet
Machine Learning Ensembles For Wind Power Prediction: Version of Record
23 pages
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
No ratings yet
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
14 pages
Project Report I
No ratings yet
Project Report I
52 pages
Crop Yield Waali
100% (2)
Crop Yield Waali
20 pages
Ai - Cyber Security Project
No ratings yet
Ai - Cyber Security Project
23 pages
ML Unit 3 r20 Jntuk
No ratings yet
ML Unit 3 r20 Jntuk
22 pages

Predicting Diabetes Onset Using Machine Learning

Uploaded by

Predicting Diabetes Onset Using Machine Learning

Uploaded by

Predicting Diabetes Onset Using Machine Learning

o Evaluate and compare the effectiveness of different machine-learning algorithms.

o Validate the model’s performance using appropriate evaluation metrics.

3. Data Collection and Preparation

3.1 Data Source

3.2 Features in the Dataset

• Pregnancies: Number of times pregnant

• Glucose: Plasma glucose concentration in an oral glucose tolerance test

• Blood Pressure: Diastolic blood pressure (mm Hg)

• Skin Thickness: Triceps skinfold thickness (mm)

• Insulin: 2-Hour serum insulin (mu U/ml)

• BMI: Body Mass Index (weight in kg/(height in m)^2)

• Diabetes Pedigree Function: Likelihood of diabetes based on family history

3.3 Data Preprocessing

4. Model Selection and Development

4.1 Model Training

• Accuracy: Proportion of correct predictions among total predictions.

• Precision: Proportion of true positives among all positive predictions.

• Recall (Sensitivity): Proportion of true positives identified by the model.

Logistic Regression 78.2% 76.5% 73.0% 74.7% 0.81

Decision Tree 75.8% 73.1% 71.5% 72.3% 0.75

Random Forest 83.4% 80.5% 78.2% 79.3% 0.87

Support Vector Machine 81.0% 78.3% 76.0% 77.1% 0.84

Gradient Boosting 85.2% 82.0% 79.8% 80.9% 0.89

Best Model Selection

3. Age: Risk of diabetes increases with age.

7. Conclusion and Future Work

• World Health Organization. (2022). Global Report on Diabetes. Retrieved from

You might also like