0% found this document useful (0 votes)

11 views15 pages

Diabetes Project MuskanAltaf

The project focuses on predicting diabetes using a machine learning model built from the Pima Indians Diabetes Dataset. The model achieved an accuracy of approximately 75%, with specific performance metrics indicating 50% correct predictions for diabetes and 82% for no diabetes. Future improvements include feature engineering, exploring different algorithms, hyperparameter tuning, data augmentation, and external validation.

Uploaded by

Mudasir Bashir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

Diabetes Project MuskanAltaf

Uploaded by

Mudasir Bashir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

X

Project File
PREDICTING DIABETES USING
MACHINE LEARNING

Submitted To: Submitted By:

MS. Gurpreet Kaur Muskan Altaf
(Assistant Professor Univ.Roll No.2105116
Clg.Roll No. 2122515
BTech CSE/7th Sem

Computer Science Engineering

ACKNOWLEDGEMENT

A good project involves the hard work of many people,

including the students and the teacher. While working on this
project, I have received unconditional support and guidance
from our teachers.

I would like to express our sincere gratitude to our teacher MS

Gurpreet Kaur Mam, for giving us such a golden opportunity
to work on this wonderful project. Her valuable words and
advice have truly motivated me. Preparing this project in
collaboration with my teacher was a refreshing experience.

I have learned many useful things from this project. Her

guidance and constant support have pushed me to successfully
complete this project.
Sincerely
Muskan Altaf

Computer Science Engineering

INTRODUCTION

Diabetes is a chronic disease that affects millions of people

worldwide. It is caused by high levels of glucose (sugar) in
the blood due to the body's inability to produce or properly
use insulin. Early detection and management of diabetes can
help prevent complications such as heart disease, kidney
failure, and blindness.

In this project, we aim to build a machine learning model that

can predict whether a person has diabetes or not based on
their medical information. We will use the Pima Indians
Diabetes Dataset, which contains information about female
patients of Pima Indian heritage and whether they have
diabetes or not. By analysing this dataset and building a
model based on it, we hope to contribute to the field of
medical diagnosis and improve the accuracy of diabetes
diagnosis.

Computer Science Engineering

Aim of the Project

The aim of this project is to build a machine learning model

that can predict whether a person has diabetes or not based on
their medical information. By analysing the Pima Indians
Diabetes Dataset, which contains information about female
patients of Pima Indian heritage and whether they have
diabetes or not, we hope to contribute to the field of medical
diagnosis and improve the accuracy of diabetes diagnosis.
Early detection and management of diabetes can help prevent
complications such as heart disease, kidney failure, and
blindness, so accurately predicting diabetes can have a
significant impact on public health.
In this project, we aim to build a machine learning model that
can predict whether a person has diabetes or not based on
their medical information. We will use the Pima Indians
Diabetes Dataset, which contains information about female
patients of Pima Indian heritage and whether they have
diabetes or not. By analyzing this dataset and building a
model based on it, we hope to contribute to the field of
medical diagnosis and improve the accuracy of diabetes
diagnosis.

Computer Science Engineering

STEPS INVOLVED:

Data Exploration
We loaded the dataset into our Python program using Pandas
and printed the first few rows and some statistics about the
data. We found that the dataset contains 768 rows and 9
columns, with some missing values.

Data Preparation:
We split the dataset into a training set and a testing set using
the `train_test_split()` function from Scikit-learn. We also
separated the input variables from the target variable. We
used logistic regression to build our machine learning model
and trained it using the training set.

Model Evaluation:
We evaluated the performance of our model by predicting
whether a person has diabetes or not in the testing set and
comparing the predictions with the actual values. We
calculated
the accuracy of our model using the `accuracy_score()`
function
from Scikit-learn and found it to be around 75%.

Computer Science Engineering

We also created a confusion matrix to see how many true
positives, true negatives, false positives, and false negatives
our
model has. From the confusion matrix, we can see that our
model
correctly predicted diabetes in 50% of cases and correctly
predicted no diabetes in 82% of cases.

Results Visualization:
We visualized the results of our model using Matplotlib by
creating a bar chart of the confusion matrix. The bar chart
shows
the number of true positives, true negatives, false positives,
and
false negatives our model has.

Computer Science Engineering

ALGORITHM:

STEP1. Import the necessary libraries for data analysis and

machine learning.

STEP 2. Load the Pima Indians Diabetes Dataset into your

Python program using Pandas as [ data =
pd.read_csv('diabetes.csv')].

STEP 3. Explore the dataset by printing the first few rows and
some statistics about the data.

STEP 4. Prepare the data by separating the input variables

(features) from the target variable (outcome), and splitting the
data into a training set and a testing set using the
`train_test_split()` function from Scikit-learn.

STEP 5. Build the machine learning model using logistic

regression and train it using the training set.

Computer Science Engineering

STEP 6. Evaluate the performance of the model by predicting
whether a person has diabetes or not in the testing set and
comparing the predictions with the actual values. Calculate
the accuracy of the model using the `accuracy_score()`
function from Scikit-learn.

STEP 7. Create a confusion matrix to see how many true

positives, true negatives, false positives, and false negatives
the model has.

STEP 8. Visualize the results of the model using Matplotlib

by creating a bar chart of the confusion matrix.

Computer Science Engineering

FULL SOURCE CODE:
# Step 1: Importing the Required Libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,
confusion_matrix
import matplotlib.pyplot as plt
%matplotlib inline

# Step 2: Loading the Dataset

data = pd.read_csv('diabetes.csv')

# Step 3: Exploring the Dataset

print(data.head())
print(data.describe())

Computer Science Engineering

# Step 4: Preparing the Data
X = data.drop('Outcome', axis=1)
y = data['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3,
random_state=42)

# Step 5: Building the Model

logreg = LogisticRegression()
logreg.fit(X_train, y_train)

# Step 6: Evaluating the Model

y_pred = logreg.predict(X_test)
print('Accuracy score:', accuracy_score(y_test, y_pred))
confusion_mat = confusion_matrix(y_test, y_pred)
print('Confusion matrix:', confusion_mat)

# Step 7: Visualizing the Results

plt.bar(['True Negative', 'False Positive', 'False Negative',
'True Positive'],
confusion_mat.ravel())
plt.title('Confusion Matrix')

Computer Science Engineering

plt.xlabel('Prediction')
plt.ylabel('Count')
plt.show()

SOFTWARE USED:
Jupyter Notebook:
Jupyter notebooks are used for all sorts of data science tasks
such as exploratory data analysis (EDA), data cleaning and
transformation, data visualization, statistical modeling,
machine
learning, and deep learning.
A Jupyter Notebook is an open source web application that
allows data scientists to create and share documents that
include
live code, equations, and other multimedia resources.
How do Jupyter Notebook Works?
A Jupyter notebook has two components: a front-end web
page
and a back-end kernel. The front-end web page allows data
scientists to enter programming code or text in rectangular
"cells." The browser then passes the code to the back-end
kernel
Computer Science Engineering
which runs the code and returns the results.

LANGUAGE USED:
PYTHON

FUTURE SCOPE:
The future scope of this project includes several potential
areas
for improvement and further research. Some of these include:
1. Feature engineering: The current model uses the original
features of the dataset, but there may be opportunities to
create new features that better capture the relationship
between the input variables and the target variable. For
example, creating a new feature that combines BMI and age
may better capture the risk of diabetes.
2. Algorithm selection: While logistic regression is a common
and effective algorithm for binary classification problems like
this one, there may be other machine learning algorithms that
can achieve better accuracy on this dataset. Experimenting
with different algorithms such as decision trees, random
forests, or support vector machines may be worth exploring.

3. Hyperparameter tuning: The current model was built using

Computer Science Engineering

default hyperparameters for logistic regression. However,
there
may be opportunities to optimize the hyperparameters to
achieve
better accuracy. Techniques like grid search or random search
can be used to find the optimal hyperparameters for the
model.

4. Data augmentation: The dataset used in this project is

relatively
small, which can limit the performance of machine learning
models. Data augmentation techniques such as oversampling,
undersampling, or synthetic data generation may be used to
increase the size of the dataset and improve the performance
of the model.

5. External validation: While the model performed well on the

testing set, it is important to validate the model on external
datasets to ensure its generalizability. Future research may
involve testing the model on other datasets with similar
characteristics to the Pima Indians Diabetes Dataset.
Overall, there is significant potential for further development
and
Computer Science Engineering
improvement of the machine learning model for diabetes
prediction using the Pima Indians Diabetes Dataset.

Conclusion:

In this project, we successfully built a machine learning model

to
predict whether a person has diabetes or not using the Pima
Indians Diabetes Dataset. Our model achieved an accuracy of
around 75% and correctly predicted diabetes in 50% of cases
and
correctly predicted no diabetes in 82% of cases. This project
demonstrates how machine learning can be used to make
predictions based on medical data. However, further
improvements can be made to the model to increase its
accuracy.

Computer Science Engineering

Diabetes Prediction Model Chapters
No ratings yet
Diabetes Prediction Model Chapters
3 pages
Diabetes Prediction Model Report
No ratings yet
Diabetes Prediction Model Report
3 pages
Project Report On Diabetes Prediction
No ratings yet
Project Report On Diabetes Prediction
29 pages
Diabetes Prediction Project ShinyAS
No ratings yet
Diabetes Prediction Project ShinyAS
11 pages
Diabetes Thesis1
No ratings yet
Diabetes Thesis1
20 pages
Report 4227
No ratings yet
Report 4227
29 pages
ML - Mini Project Diabetic Prediction
No ratings yet
ML - Mini Project Diabetic Prediction
13 pages
Estimaing Diabetic Risk Accurately (Documentation)
No ratings yet
Estimaing Diabetic Risk Accurately (Documentation)
56 pages
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
Dataset
No ratings yet
Dataset
13 pages
Irjet V6i3277
No ratings yet
Irjet V6i3277
7 pages
Minor Project Report
No ratings yet
Minor Project Report
46 pages
Ads Exp 10
No ratings yet
Ads Exp 10
10 pages
Diabetes Prediction via ML Models
No ratings yet
Diabetes Prediction via ML Models
9 pages
Major Project
No ratings yet
Major Project
53 pages
Estimating Diabetic Risk Accurately
No ratings yet
Estimating Diabetic Risk Accurately
26 pages
AICTE Internship 2024 Project Report Template 2
No ratings yet
AICTE Internship 2024 Project Report Template 2
27 pages
Poster Template
No ratings yet
Poster Template
1 page
Student ML Project: Diabetes Predictor
0% (1)
Student ML Project: Diabetes Predictor
25 pages
Kanak Blackbook Project
No ratings yet
Kanak Blackbook Project
57 pages
Synopsis Diabetes Pred System ML
No ratings yet
Synopsis Diabetes Pred System ML
9 pages
AIML Lab Manual
67% (3)
AIML Lab Manual
31 pages
Report
No ratings yet
Report
47 pages
Binod ML Project-052
No ratings yet
Binod ML Project-052
14 pages
Pro 1
No ratings yet
Pro 1
11 pages
CSD Project Batch 4
No ratings yet
CSD Project Batch 4
22 pages
Major Project Final TABLE DIAGRAM
No ratings yet
Major Project Final TABLE DIAGRAM
28 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
Machine Learning and Deep Learning Techniques
No ratings yet
Machine Learning and Deep Learning Techniques
13 pages
Machine Learning and Applications CS522I1C
No ratings yet
Machine Learning and Applications CS522I1C
15 pages
Predictive Model For Diabetes Using Machine Learning
No ratings yet
Predictive Model For Diabetes Using Machine Learning
38 pages
IPL Winning Prediction Intern Report
No ratings yet
IPL Winning Prediction Intern Report
52 pages
Handwriting Recognition: Chappidi Aswarta Reddy (Urk18Cs080)
No ratings yet
Handwriting Recognition: Chappidi Aswarta Reddy (Urk18Cs080)
27 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Project Report Minor
No ratings yet
Project Report Minor
33 pages
Automated Payroll Management System
No ratings yet
Automated Payroll Management System
4 pages
Diabets Project Document3
No ratings yet
Diabets Project Document3
60 pages
Final Seminar Report Soumya
No ratings yet
Final Seminar Report Soumya
20 pages
Classifier Model For Diabetes Prediction
No ratings yet
Classifier Model For Diabetes Prediction
30 pages
Diabetes Prediciton Model
100% (1)
Diabetes Prediciton Model
23 pages
MLPPT 11 45
No ratings yet
MLPPT 11 45
31 pages
Ai Datascience Project Grade 10
No ratings yet
Ai Datascience Project Grade 10
14 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Mini Project Report
No ratings yet
Mini Project Report
34 pages
Sample INTERNSHIP Report
No ratings yet
Sample INTERNSHIP Report
32 pages
Diabetes Documentation
No ratings yet
Diabetes Documentation
54 pages
Diabetes Prediction via ML
No ratings yet
Diabetes Prediction via ML
82 pages
Review 2 Final
No ratings yet
Review 2 Final
27 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
25 pages
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
100% (1)
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
35 pages
Aiml Project Report
No ratings yet
Aiml Project Report
10 pages
Diabetes Prediction with ML
No ratings yet
Diabetes Prediction with ML
38 pages
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
No ratings yet
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
20 pages
Diabetes Detection with ML
No ratings yet
Diabetes Detection with ML
10 pages
Grade X - AI - October - 2024 - QP
No ratings yet
Grade X - AI - October - 2024 - QP
6 pages
AIML NOTES Organized
No ratings yet
AIML NOTES Organized
12 pages
Lecture 21
No ratings yet
Lecture 21
16 pages
Ijs DR 2205103
No ratings yet
Ijs DR 2205103
4 pages
Wongoutong (2024) - Kmeans Clustering
No ratings yet
Wongoutong (2024) - Kmeans Clustering
19 pages
Evaluation Exercise
No ratings yet
Evaluation Exercise
3 pages
Diabetes Pridiction Using Machine Learning
No ratings yet
Diabetes Pridiction Using Machine Learning
31 pages
Machine Learning: Huawei AI Academy Training Materials
No ratings yet
Machine Learning: Huawei AI Academy Training Materials
46 pages
Logistic Regression On Titanic Dataset
No ratings yet
Logistic Regression On Titanic Dataset
6 pages
Biomedical Image Analysis Using Python
No ratings yet
Biomedical Image Analysis Using Python
27 pages
Adikavi Nannaya University: University College of Engineering
No ratings yet
Adikavi Nannaya University: University College of Engineering
13 pages
Python Programs
No ratings yet
Python Programs
4 pages
Program - 9
No ratings yet
Program - 9
7 pages
ML - LAB - FILE Amrit
No ratings yet
ML - LAB - FILE Amrit
13 pages
UNIT4 Confusion Matrix
No ratings yet
UNIT4 Confusion Matrix
12 pages
III BCA ML - Syll - Model - All Units
No ratings yet
III BCA ML - Syll - Model - All Units
85 pages
Final Diabetes Prediction Documentation
No ratings yet
Final Diabetes Prediction Documentation
52 pages
Report (Decision Tree) Final
No ratings yet
Report (Decision Tree) Final
39 pages
HOG-SVM for Object Recognition
No ratings yet
HOG-SVM for Object Recognition
39 pages
Bi12-019 Bi12-263 LW3
No ratings yet
Bi12-019 Bi12-263 LW3
35 pages
Fake News Detection with AI Methods
No ratings yet
Fake News Detection with AI Methods
11 pages
Seminar Report Shanu Saklani
No ratings yet
Seminar Report Shanu Saklani
22 pages
An Artificial Intelligence Model For Heart Disease Detecti 2022 Healthcare A
No ratings yet
An Artificial Intelligence Model For Heart Disease Detecti 2022 Healthcare A
17 pages
Confusion Matrix
No ratings yet
Confusion Matrix
19 pages
Power System
No ratings yet
Power System
10 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
ML
No ratings yet
ML
131 pages
تنقيب البيانات2020 السعيدة النموذج الاصلي
No ratings yet
تنقيب البيانات2020 السعيدة النموذج الاصلي
2 pages
Kabir Khan 1147 - 4
No ratings yet
Kabir Khan 1147 - 4
4 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
27 pages
A CNN-Based Human Head Detection Algorithm Implemented On Edge AI Chip
No ratings yet
A CNN-Based Human Head Detection Algorithm Implemented On Edge AI Chip
5 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
34 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Mini Project On Diabetes Prediction: Information Technology
No ratings yet
Mini Project On Diabetes Prediction: Information Technology
19 pages

Diabetes Project MuskanAltaf

Uploaded by

Diabetes Project MuskanAltaf

Uploaded by

X

Submitted To: Submitted By:

Computer Science Engineering

A good project involves the hard work of many people,

I would like to express our sincere gratitude to our teacher MS

I have learned many useful things from this project. Her

Computer Science Engineering

Diabetes is a chronic disease that affects millions of people

In this project, we aim to build a machine learning model that

Computer Science Engineering

The aim of this project is to build a machine learning model

Computer Science Engineering

Computer Science Engineering

Computer Science Engineering

STEP1. Import the necessary libraries for data analysis and

STEP 2. Load the Pima Indians Diabetes Dataset into your

STEP 4. Prepare the data by separating the input variables

STEP 5. Build the machine learning model using logistic

Computer Science Engineering

STEP 7. Create a confusion matrix to see how many true

STEP 8. Visualize the results of the model using Matplotlib

Computer Science Engineering

# Step 2: Loading the Dataset

# Step 3: Exploring the Dataset

Computer Science Engineering

# Step 5: Building the Model

# Step 6: Evaluating the Model

# Step 7: Visualizing the Results

Computer Science Engineering

3. Hyperparameter tuning: The current model was built using

Computer Science Engineering

4. Data augmentation: The dataset used in this project is

5. External validation: While the model performed well on the

In this project, we successfully built a machine learning model

Computer Science Engineering

You might also like