Ipsita PR
Ipsita PR
MACHINE LEARNING
A Project Report
Submitted by:
of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
(June 2025)
CERTIFICATE
This is to certify that the project report titled “DIABETES DETECTION USING MACHINE
LEARNING” being submitted by Ipsita Sahoo (2141019099), Brajesh Mohanty (2141013219) and
Reetuparna Baral (2141002025) of CSE-S to the Institute of Technical Education and Research, Siksha
‘O’ Anusandhan (Deemed to be) University, Bhubaneswar for the partial fulfilment for the degree of
Bachelor of Technology in Computer Science and Engineering is a record of original confide work carried
out by them under our supervision and guidance. The project work, in our opinion, has reached the requisite
standard fulfilling the requirements for the degree of Bachelor of Technology.
The results contained in this project work have not been submitted in part or full to any other University
or Institute for the award of any degree or diploma.
ii
ACKNOWLEDGEMENT
We would like to express our sincere thanks to our guide Dr. Brajesh Kumar Umrao for his expert
guidance and support in completing our project. I would also like to extend our gratitude and
respect to our SDP teacher Dr. Farida A Ali for providing us with the facility and knowledge
about so many new things.
Date: 20/06/2025
iii
DECLARATION
We declare that this written submission represents our ideas in our own words and where other’s
ideas or words have been included, we have adequately cited and referenced the original sources.
We also declare that we have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any idea/fact/source in our submission. We
understand that any violation of the above will cause for disciplinary action by the University
and can also evoke penal action from the sources which have not been properly cited or from
whom proper permission has not been taken when needed.
Date:20/06/2025
iv
REPORT APPROVAL
Examiner(s)
________________________
________________________
________________________
Supervisor
________________________
Project Coordinator
________________________
v
PREFACE
One of the most common and debilitating chronic diseases, diabetes affects millions of individuals
worldwide. It is brought on by either the pancreas producing insufficient amounts of insulin or the body
using insulin inefficiently. The incidence of diabetes cases has increased dramatically in recent years,
especially in developing countries, as a result of dietary changes, lifestyle alterations, and a lack of early
identification. This delay in diagnosis often results in serious health problems, such as cardiovascular
illnesses, brain impairment, and kidney failure. Therefore, early and accurate detection is crucial for
managing diabetes and preventing complications.
Traditionally, diabetes has been diagnosed through normal clinical testing, which can be expensive, time-
consuming, and not always accessible. Many people do not receive routine tests because of awareness,
financial constraints, or geographic limits. In order to tackle these problems, researchers have been looking
at the potential applications of machine learning (ML) techniques to develop predictive models that, using
common medical data, can aid in early diagnosis.
XGBoost, Random Forest, Decision Trees, and Logistic Regression are just a few of the machine learning
methods that have shown promising results in identifying diabetes-related patterns in patient data. These
models can be trained on datasets containing traits like as blood pressure, insulin and glucose levels, age,
BMI, and diabetes pedigree function to identify if a person has diabetes or not. In our work, we examined
a number of machine learning models and evaluated their predictive power for diabetes using performance
metrics such as accuracy, precision, and F1-score.
Both the use of these models and the idea of integrating them into wearable technologies, like smartwatches,
are the main objectives of our project. By making diabetes monitoring more ongoing, real-time, and
accessible to people, this has the potential to completely transform personal healthcare. The goal of this
project is to create a diabetes diagnosis model that is precise, quick, and simple for patients and medical
professionals to comprehend.
The importance of feature selection and data quality for medical machine learning applications is also
emphasized in the paper.
vi
INDIVIDUIAL CONTRIBUTIONS
vii
TABLE OF CONTENTS
Title Page i
Certificate ii
Acknowledgement iii
Declaration iv
Report Approval v
Preface vi
Individual Contributions viii
Table of Contents ix
List of Figures x
List of Tables xi
1. INTRODUCTION 1
Project Overview 1
Motivation 3
Uniqueness of the Work 5
Report Layout 7
2. LITERATURE SURVEY 7
2.1 Existing System 8
2.2 Problem Identification 13
3. METHODS 15
3.1 Dataset Description 15
3.2 Schematic Layout 17
3.3 Methods Used 17
3.4 Evaluation Measures 18
4. RESULTS 18
4.1 System Specification 19
4.4 Experimental Outcomes 19
5. CONCLUSIONS 22
6. REFERENCES 24
7. APPENDICES 25
8. REFLECTION OF THE TEAM MEMBERS ON 27
THE PROJECT
9. SIMILARITY REPORT 28
viii
LIST OF FIGURES
ix
LIST OF TABLES
1 Dataset Description 16
2 Results 19
3 Appendix table 26
x
1. INTRODUCTION
Under-treatment of diabetes can lead to dire complications affecting millions across the globe.
The importance of bringing forward the date of diagnosis with the aim to lessen the load of the
healthcare system and, therefore, improve individual health outcomes cannot be stressed enough.
We will examine the feasibility of applying traditional health indicators such as age, blood
pressure, insulin, glucose, skinfold thickness, number of pregnancies, and family history to early
prediction of diabetes. The aim would be to cause change: machine-learning applications of these
medical indicators for faster and better diagnosis.
The ultimate goal is to conduct an assessment of predicting ability using various machine learning
models on diabetes, which, if successful, could lead to the development of innovative healthcare
technologies with the ability to empower providers and patients.
Diabetes is one of the many chronic illnesses afflicting millions of people throughout the globe.
Often, this develops unbeknownst to anybody, until it comes to its full course of complications.
Early detection is crucial because it allows a person to do something before his health gets worse.
This project revolves on shaping a machine-based prediction approach considered to be very
useful tools in healthcare in providing early information with improved outcomes. The system we
aim to develop thus allows an individual to know how risky they are to develop diabetes by
analysing their physiological and clinical data, increasing the availability and personalization of
early care.
This work is based on a very rich dataset, containing indicators ranging from age, high blood
pressure, glucose levels, levels of insulin, thickness of skinfolds, number of pregnancies, and
family history with diabetes. Each entry relates to a real person, tagged whether they have diabetes
or not. The richness of data enables the model to learn subtle patterns indicative of a higher risk
to this condition.
Data cleaning was the first thing we had to do before training any models. Common missing,
outliers, and inconsistencies within data of real-world scenarios affected output of the result. We
normalized values so that they would hold well across records, balanced distributions, and very
carefully managed any missing using imputation in order that the most trusted input would be
1
given to our models. Importantly, another thing within this project was the choice of relevant
machine learning models such as XGBoost, Random Forest, Decision trees, and Logistic
Regression we tried on a variety of model settings, each with its own advantages.
XGBoost can really uncover complicated patterns. Random forest combines trees to decrease a
part of overfitting and really improvised accuracy: Decision tree is interpretable and easy to
explain, whereas, in case of use in binary classification tasks like this, Logistic Regression does
a lot of good by being simpler.
We benchmarked how well these models performed by training them on the pre-processed data
and then measuring results using popular metrics like accuracy, precision, and F1-score. The
measure of accuracy is the result of efficiency over its whole performance. Precision refers to
what extent it could escape false positives. F1 score balances the two, with special needs for this
use in finding people who really have diabetes. These metrics heeded the lessons learned as well
as action areas that required refinement.
Pima Indian Diabetes Dataset (PIDD) is also the repository we're working from in this research;
it contains accurate medical records from a group of 768 women beyond 21 years. Justifiably,
because of how rich its health data are, it's widely respected. We separated the data into training
and test sets, that is, 80% of the data would be used for training the model, while the remaining
20% would be set as a test for the model. This division will ensure that our trained models are
able to apply their learned behaviour to new, unseen data, and do not just memorize patterns.
Our objective is, however, beyond making good accurate models. The intent is to build something
credible that can affect clinical practice. Systems of this nature could enable individuals to
appreciate their level of risk as healthcare practitioners would instead channel their attention
towards delivering right care at the appropriate time. Imagine being able to diagnose potential
cases of diabetes early and modify treatment before.
This project showcases how AI is revolutionizing the medical field. It allows medical
professionals to spend more time managing patients rather than data captured in huge volumes by
laborious analytical methods. Such models can even spot trends that conventional statistics would
overlook, denoting possible new risk factors or even behaviours that might cause diabetes.
Real time health data collection from wearables is what we envision to extend this system into
the future. Imagine real-time updates in terms of glucose, heart rate, activity, and others that one
would measure on smart devices coupled with AI recommendations personalized towards
individual behaviour and able to identify early warning features. That type of coupling would
2
really allow people to take their health powerfully into their own hands-with medical advice at
the right time.
As we said, this project is not about algorithms. It is 100% about people, of course. Diabetes takes
an emotional and a physical toll. Managing it can be overwhelming. If we can give heads-up and
motivate early preventive measures, that's a win, not only for healthcare systems but for quality
of life.
Machine learning is just getting more relevant in terms of healthcare. This project is a step forward
into getting that opportunity not only to react to diseases but also to pre-empt them as much as
possible. Making improved models, wider datasets, and integration into daily healthcare measures
would contribute toward making early detection just another routine way of managing diabetes-
before it ever happens.
1.2 Motivation
Diabetes has become one of the most concerning health issues of our time, silently affecting
millions across the globe. It often develops without noticeable symptoms, only coming to light
when serious complications emerge. Many people only receive a diagnosis after significant
warning signs appear—often due to missed routine checkups or downplaying changes in their
health. These delays are contributing to a growing public health challenge, putting both
individuals and healthcare systems under pressure.
The need for a more innovative method of diabetes detection is what motivated this project. The
goal is to develop a system that can use quantifiable health data to estimate an individual's risk
rather than waiting for symptoms to worsen. We intend to bridge the gap between early indicators
and actual diagnosis by utilising machine learning. We're also envisioning how this system might
integrate with wearable technology, such as fitness trackers or smartwatches, to provide people
with convenient real-time health monitoring. In this manner, they can act proactively before any
problems worsen.
Building a dataset that reflected the different causes of diabetes was the first step in developing a
reliable model. A strong dataset is required to ensure that the system produces accurate
predictions. We examined factors such as age, blood pressure, skin thickness, insulin and glucose
levels, family history of diabetes, and the number of pregnancies. Each data entry indicates
whether or not a person has diabetes. The model can identify risk trends across different
backgrounds thanks to the diversity of the data.
3
Before diving into model training, the data had to be carefully cleaned and prepared. Like most
real-world data, it had a few hiccups—missing values, uneven distributions, and inconsistent
entries. We tackled these by filling in gaps, reshaping features, and standardizing values. This step
was crucial in ensuring the models had dependable input and could learn from the data effectively.
The next step was to choose and evaluate various machine learning models. We examined a
number of them, each with their own advantages: XGBoost, Random Forest, Decision Tree, and
Logistic Regression. Because of its reputation for identifying intricate relationships in data,
XGBoost is an effective tool for risk assessment. Multiple decision trees are combined by Random
Forest to increase accuracy and decrease error. Healthcare practitioners benefit from decision tree
models' ease of interpretation. Additionally, despite its simplicity, logistic regression is still a good
option for yes/no predictions, such as diabetes diagnosis.
We employed common evaluation metrics to determine each model's performance. We were able
to gauge the overall frequency of correct predictions by using accuracy. Accuracy aided in
evaluating the models' ability to prevent false alarms. By preventing too many false positives and
ensuring that we didn't overlook any real cases, the F1-score helped strike a balance. These
measurements helped us identify the most viable options by clearly illustrating the advantages
and disadvantages of each model.
The PIDD, a reputable resource that contains health data from 768 women aged 21 and up, is the
dataset we used for this project. It is especially helpful for machine learning research because of
its comprehensive features. 20% of the data was used to test the model's ability to predict novel,
unseen cases, and the remaining 80% was used to train the model. This method made sure the
model gained valuable insights rather than merely memorising the data.
The main goal of this project is to improve healthcare by making it more accessible and intelligent.
People can better manage their health and receive assistance sooner if early alerts are made
available. This vision heavily relies on wearable technology. A smartwatch or fitness tracker that
incorporates machine learning could provide people with timely health information, encouraging
them to make healthier decisions or warning them when it's time to visit a doctor.
There are many more applications of machine learning in healthcare than just this one. Predictive
tools like these could aid physicians in making earlier diagnoses, better customising treatments,
and even identifying previously unknown health risks. They assist medical professionals,
ensuring that no important information is overlooked, rather than taking their place.
4
It's simple to envision how this strategy might develop in the future. AI systems that provide
tailored recommendations could be integrated with devices that track everyday health metrics,
such as heart rate, blood sugar, and physical activity. Technology like this has the potential to
change the way we treat chronic illnesses by emphasising prevention over emergency response.
There is more to this research than just numbers and algorithms, of course. It has a profoundly
human aspect. Keeping track of medication, controlling diet, and maintaining an active lifestyle
can all be emotionally taxing for people with diabetes. There's a lot to manage. Early support
systems can give people encouragement and peace of mind, making them feel less alone and more
in control of their health journey.
Numerous aspects of life are already changing as a result of machine learning, but its potential in
healthcare is particularly intriguing. By demonstrating how technology can aid in disease
prevention rather than just treatment, this project makes a significant advancement in that
direction. We intend to create something that helps people before they even realise it by keeping
improving our models, growing our data, and integrating with commonplace tools.
Diabetes is currently one of the most-dreaded modern-day illnesses, troubling countless people
across continents. It appears to sneak up on individuals, often starting in a silent fashion with
symptoms rarely detectable to the patient, and only being diagnosed at advanced stages of
development. Even those that recognize the early onset of warning signs only to diagnosis after
the manifestation of signs and symptoms receive a delayed diagnosis. The mere thought of missed
routine check-ups or underestimating changes in health gives birth to the lax nature attributed to
the fight against such a pandemic. The resulting effects that come with these missed early
opportunities are posing a well-recognized challenge to public health, on the strategies of
individual patients, or to the health care systems; indeed, during these critical hours of greatest
need, we have totally defeated the opportunity.
This provided the impetus for a quest to explore a more creative way of detecting diabetes.
Theoretically, the idea can rest upon the acceptance of a mechanism that would be able to
articulate quantified health data to assess an individual's risk of developing conditions in advance
of really sick symptoms. Using such advanced techniques as machine learning, we foresee
resolving a gap between the categories of early signals and the actual diagnosis. We will further
envision how this system could couple with wearables--think fitness trackers and smartwatches-
5
-to allow people real-time easing of health burdens. In this form, they can then exercise some
proactive measures before faced with compounding complications.
The first step in developing a reliable model was the creation of the dataset that reflects the various
causative factors of diabetes. The importance of a sound dataset is the assurance that the system
will give accurate predictions. Age, blood pressure, skin thickness, insulin and glucose levels,
family history of diabetes, and number of pregnancies were pertinent variables in our
investigation. Each entry into the dataset indicates whether or not a subject is diabetic. With the
distinctiveness of the various data sources employed, the model can trace risk trends along
different backgrounds.
Pre-processing and cleaning of the data had to be done properly before moving on to model
training. It had its share of setbacks common to most real-world data: missing values, distribution
imbalances, and discrepancies among the entries. We tackled these issues through imputations,
reshaping feature spaces, and standardizing values. This step was truly critical to providing the
models with a reliable input to learn from.
The next step was suited to choosing and analysing various machine learning models. We
evaluated several, each with its own merits: XGBoost, Random Forest, Decision Tree, and finally
Logistic Regression. Owing to the peculiar ability to uncover hidden complicated relationships
with data, it is one of the most powerful algorithms for risk evaluation. Random Forest
incorporates several decision trees on various data samples to enhance its accuracy and control
errors. Decision tree models are preferred by practitioners in the health care system due to their
interpretability. Not to say that logistic regression cannot be used, it is increasingly uncommon
for a yes/no prediction such as diabetes identification.
Common evaluation metrics were used to report the performance of every model built. By
evaluation of accuracy, we were able to perceive how frequently we were correct overall in
making predictions. Accuracy also served as an assessment for the ability of the projects to avoid
giving false alarms. The balance was thus struck by the F1-score, avoiding too many false
positives while making sure we do not miss out on any of the real cases. Together these metrics
helped to put into clear perspective some of the options worthy of consideration along with their
major advantages and disadvantages.
We used the Pima Indians Diabetes Dataset, which is reasonably considered a gold standard data
resource that contains health data on 768 women aged 21 and over, in this study. Its richness in
features makes it especially helpful for machine-learning-based research. 20% of the data was
6
used to test the model for its ability to predict novel/unseen cases, and the remaining 80% was
used to train the model. The rationale behind this method was to ensure that the model had
actually learned something rather than just memorizing the data.
The main purpose of this project is to increase accessibility and intelligence in health care. With
early alerts, people can take ownership of their health and seek help as early as possible. Wearable
technology is very much present in this vision. A smartwatch or fitness tracker with a machine-
learning back-end could offer users timely health pointers, nudging them towards healthier
choices or alerting them when it is time to go to the doctor.
This is just one of many applications of machine learning in health care. Such predictive tools
could assist in earlier diagnoses and customization of treatment by the physician while identifying
new risks for health. They act as aids to practitioners rather than substituting them to ensure
nothing vital is missed.
It is easy to picture how, even farther in the future, this trajectory could come together. AI systems
with personal recommendations could merge with devices to monitor an individual's health
metrics (i.e., heart rate, blood sugar, and physical activity.) Such technology can radically change
the treatment of chronic diseases by turning away from emergency response predicaments and
more towards prevention.
There is far more in this research than straight numbers and algorithms. It has an equally human
side. Following medications, watching diet, and maintaining an active life can all be gut-
wrenchingly exhausting for a person with diabetes. Managing so much becomes overwhelming.
Early support systems can offer such people encouragement and a peace of mind: the knowledge
that they are not alone but can do something in the prospect of their health management.
Machine learning is already changing countless dynamics in everyday life, but its future is
particularly bright in the health care domain. This project is a large step on that path in
demonstrating the technology's potential in prevention, not merely therapy. We want to build
something that protects people before they even suspect it by refining our models, enlarging our
data, and fusing with the widely used tools.
The focus of Section 2 is a summary of research work in this area, emphasizing important studies
and their results concerning predicting diabetes. Section 3 describes the model on which this
work is laid out, mentioning the form, structure, and method for accomplishing the designed
7
structure. Section 4 deals with the experimentation and evaluation of the model concerning proper
metrics, while Section 5 concludes with the results drawn from the study and indicates possible
avenues for future research and development. Section 6 lists the sources cited in the report,
including research papers and articles related to diabetes prediction using machine learning
techniques. Section 7 provides supplementary materials that support the report, including dataset
descriptions, machine learning models and hyperparameters, system specifications, and model
evaluation metrics. Section 8 reflects on the team's experience working on the project,
highlighting the challenges they faced, the skills they developed, and the importance of
teamwork, communication, and patience in completing the project successfully.
2. LITERATURE SURVEY
This review is an exhaustive sifting of different methods concerning diabetes detection for
research works that demonstrate how machine learning has enhanced medical predictions. One
study introduced nine health parameters (age, BMI, and blood sugar levels) into a dataset
containing 800 patient records. The models tested included Random Forest, Gradient Boosting
Classifier and Logistic Regression, with the first two achieving 76 percent accuracy. Thus,
Logistic Regression was crowned winner among these models, being competent for medical
classification. This shows how dataset nature influences algorithm choice.
Another research was carried out in the domain of diabetes management systems, covering data
from Random Forest and Logistic Regression with 796 entries. The method truly showcased the
potency of ensemble learning in the management of structured medical data at an astounding
accuracy of 80.52%. Interestingly, this method continued to surpass some of the conventional
models, including Decision Tree, Support Vector Machine (SVM), and Soft Voting, thereby
demonstrating its capability in medical diagnostic applications.
Classification techniques such as SVM, Decision Tree, Logistic Regression, Random Forest, and
Gradient Boosting have also been evaluated on the most indeed famous standard PIDD containing
768 samples and nine medical features. Random Forests are the one that keeps giving the best
accuracy on several tests, reaching a total model performance of 77% accuracy. However,
Logistic Regression also deserves special mention since it achieved 96% accuracy in a few test
cases thereby confirming suitability for binary classification problems like diabetes detection.
8
2.1 Existing System
Diabetes affects a large number of people worldwide, making early detection vital to its timely
treatment. Conventional diagnostic tests requiring traditional biochemical assays can be quite
costly, time-consuming, and inaccurate for more obvious reasons-glucose measurements being
the most intended. With the arrival of computational technologies, the advances made in the field
of machine learning-related matters have become strong tools in actually diagnosing patients
through the differentiation of patterns in the patient's previous details which were never made
visible through conventional techniques.
Random Forest and Logistic Regression are the most commonly used and well-studied algorithms
for enhancing the diabetes detection model. Logistic Regression predicts the probability of having
diabetes based on some key health indicators using logistic regression, most likely due to
goodness at probability estimations, while Random Forest is a powerful classifier that handles
the complexity associated with many variables, thus together enhancing the diagnostic
correctness.
Numerous studies exist, which have also considered several machine-learning models for risk
estimation in diabetes. In the assessment where the study sample comprised 800 subjects, logistic
regression would yield a reliable accuracy of 76% when the associative dependence among some
variables would almost hold the estimation linearly. Another finding report from a study using
Random Forest and Logistic Regression indicates that an accuracy of 80.52% is achieved with a
dataset consisting of 796 patients. This has to a large extent provided impetus to enhance
classification in ensemble learning.
All classifiers, including SVM, Decision Tree, Logistic Regression, Random Forest, and Gradient
Boosting, were further investigated using the well-known Pima Indian Diabetes dataset, rich in
diverse medical attributes and comprising 768 samples. The consistency of the Random forest is
the most consistent classifier or model that would cope with the interaction among complex
feature and high-dimensional data. Remarkable, nevertheless, is the accuracy achieved by
Logistic Regression, which on occasions in projected tests achieved figures of 96%, thus
confirming its competence on binary classification matters.
Inevitably, these developments have not come without challenges. Most studies tend to use
publicly available datasets, with potential misrepresentations of population diversity,
jeopardizing their applicability in real-life healthcare settings. Increasingly, reliance on ensemble
approaches that incorporate many models for enhanced reliability would be critical in mitigating
9
this issue going forward. Random Forest and Logistic Regression, with their strong classifications
and capacity for estimating probabilities, strike a balanced chord in the detection of diabetes.
Future improvements in the deep learning algorithms would enhance their prediction accuracy on
such models, perhaps by incorporating a neural network approach. AI-enabled portable health
technology would open a new paradigm for continuous surveillance, leading to real-time
personalized therapy for diabetes management. Machine learning has greatly revolutionized
diabetes diagnosis, leading to development of effective screening methods that empower health
workers to take preventive measures early against the disease condition. Great strife is being
aimed at interpreting algorithms and biases in data. Random Forest and Logistic Regression,
nevertheless, exhibit a commendable act in classification on treating real challenges, where
machine learning straightforwardly contributes to improved patient outcomes with early
prevention and much easier access to healthcare services. It should be in the foreseeable future
that deep learning and AI-powered wearables will flourish and consequently enhance diagnostic
accuracy, thus changing the game on diabetes control by enabling continuous monitoring of study
subjects. [1]
The investigation has been reported in [2]. Many people live in thousands worldwide with
diabetes, which classifies the disease as one of the common metabolic disorders that when
detected in time, serious complications can be averted. Conventional techniques for diagnosis use
biochemical tests such as glucose level readings which, though very accurate, can be very costly
and time consuming. As computational technologies have advanced, machine learning has
emerged a powerful instrument for medical diagnostics which effectively analyses complex
patterns of patient data.
Models focusing on the performance improvement of diabetes detection have ended up using a
variety of machine learning methods such as Random Forest, Decision Trees, and Adaptive
Boosting. Indeed, these models form the basis on which strong predictions can be made for
structured medical data. The study is one examining the efforts in developing an AI-driven
diabetes prediction system, as published in [3] to indicate the trend of the use of intelligent
algorithms in health. In this case, the suggested system has several features including an AI
Chatbot, a diabetes risk prediction module, and an appointment scheduling to help bring further
improvements in diagnosis accuracy and patient management.
It is through the use of the Random Forest algorithm that the study becomes useful in predicting
diabetes, with which it achieves such amazing accuracy rates of 90.4%, when compared to
traditional classification methods. One of the major findings of this research entails preprocessing
data, which improves a model's performance. Outliers, irrelevant features, and missing values can
10
play a huge role in affecting prediction accuracy. It focuses on changing these data-related
problems through robust scaling normalization to remove outliers, mean imputation for missing
data, and oversampling techniques to balance class distributions.
To optimize machine learning models and to ensure that only the most important variables can
affect predictions, feature selection is essential. This study carried out comparisons between
various classification models such as Decision Trees, SVM, and Naive Bayes, while using
benchmark datasets such as PIDD. Results show that ensemble learning methods, especially
Random Forest, are providing better classification performance as they can handle high-
dimensional data and capture complex relationships between medical variables. Adaptive
boosting further enhances model reliability by merging several weaker classifiers into a more
powerful predictive system.
The systematic approach is adopted in the research investigation beginning with data
preprocessing to ensure accuracy and standardization of values. This allows for learning from
past patient data without penalizing these cases for the generalization of new cases. The
classification process has two steps, one where all features are used for prediction and the results
indicate Random Forest to be the most accurate. Evaluation metrics-such as precision, recall, and
F1 score-reveal model performance, while validation loss trends indicate that more training
epochs increase the accuracy and lower the risk of overfitting.
Actual patient diagnosis contrasts with what the expected results entail to measure practical
applicability, thus demonstrating effectiveness in real-world medical situations. The prospective
merits of the study are noted with some limitations, and suggestions for improvement of future
research are offered. Hyperparameter tuning, additional external parameters such as genetic
markers and lifestyle factors, and testing hybrid models from merging machine learning and deep
learning approaches would all add to predictive accuracy in future studies.
In the future, real-time adaptability may probably be one of the significant areas of focus, allowing
the models to change predictions dynamically depending on data collected from individual
patients in real time instead of purely on past patterns. This study showcases how machine
learning can bring practical examples to medical professionals in advancing their techniques in
diagnosis-making, reflective of the growing role of artificial intelligence in healthcare. Healthcare
professionals are thus poised to better guide the patient experience in undergoing certain therapies
by implementing predictive AI-powered models.
It is expected that with the unfolding of AI, innovative deep learning algorithms could improve
an existing diagnostic instrument and lead to a revolution in diabetes treatment. Research on the
11
use of wearable AIs may lead to the development of custom health care and recurrent monitoring
methods, and further changes in diabetes management would be seen through data-driven and
time-tested solutions. Machine learning holds great promise in augmenting healthcare by making
it proactive, accessible, and effective through the amalgam of medical knowledge with
computational intelligence.
Diabetes is stated to have affected millions around the world. The ailment itself, regardless of
how it manifests, has a profile of complications associated with it, namely kidney disease,
changes in vision, and some cardiovascular disorders. Because of the rise in prevalence of
diabetes, researchers in the field are working towards machine-learning techniques for better early
detection and prognosis, hoping to improve outcomes for patients in tomorrow's data-driven
health insights. Old methods of diagnosis rely on blood glucose tests and the doctor assessment;
machine learning shows the possibility of a new route that is more fruitful in exploring complex
health trends.
Prediction models on diabetes risk are mostly studied using various methodologies. Some
algorithms in this research field include Logistic Regression, SVM, Naïve Bayes, Decision Trees,
Random Forest, and Naive Bayes. Also applied ensemble methods include AdaBoost, Gradient
Boosting, and XGBoost. M. Hasan and F. Yasmin [2] conducted a study to devise a new
framework where deep learning architectures aid conventional classifiers for yielding improved
predictive accuracy.
This study tries to put forward a comparison between several states of machine learning models
using the diabetes dataset from Kaggle. In fact, it proves that Random Forest and XGBoost
outperform the performance of classical methodologies by facilitating the handling of structured
medical data. Introducing additional described models, the study incorporates the feasible
combination of Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN)
into its layers. This innovative architecture develops performance out of the book phenomenally
at an accuracy rate of 99.79%. Consequently, CNN layers are dedicated to discovering critical
features from the patient profile while the LSTM layers capture sequential dependencies for
predicting diabetes with almost perfect accuracy in the system.
Another point to consider is that the irrelevant predictors should be productive forecasts, thus
making feature selection part of the optimization process of the machine learning model. In this
sense, Hasan and Yasmin found important variables which are age, family history, BMI, and
glucose levels through their Random for Feature Importance that made their model reliable and
interpretable. Present research highlights the improvement of ensemble learning by combining
12
several classifiers, thereby increasing predictive power especially for highly complex
applications in medicine.
Neural networks-the hybrid ANNs like the CNNs and the LSTMs-have performed extraordinarily
well in spotting the nuanced, hidden patterns in the medical data, while simpler models such as
logistic regression would own such decisions around them. Certainly, improvements were made,
but everything is not yet smooth. The studies usually use open-source databases, for example, the
PIDD, but it does not represent an interesting mixed rainy population. Realistic electronic health
records have to be promoted.
In addition, optimization such as Adam optimizer and hyperparameter tuning have been
contributing greatly in enhancing stability and effectiveness in the model. Model assessment is
also done through using important performance metrics such as accuracy, precision, recall, F1-
score, and area under the receiver operating characteristic curve (ROC-AUC) to support
reliability. High ROC-AUC values are recorded in Hasan and Yasmin's study to prove the success
of their model in differentiating with or without diabetes patient.
Predictive systems would then develop into more advanced models capable of assisting in real-
time patient data to change forecasts rather than only past-trained information. Future works
could focus on hybrid models combining deep learning and traditional classifiers, better feature
extraction methods, and the use of wearable medical technology for permanent diabetes
monitoring.
Deep learning has come into use in diabetes detection and now increasingly improving the
diagnostic accuracy rates while providing real-time insight to health professionals based on
evidence for earlier detection and individualized treatment planning. Machine learning is capable
of transforming diabetes care through the provision of intelligent and flexible health solutions,
courtesy of continued growth of artificial intelligence and wearable technology [3].
This study in [4] covers the direct incorporation of deep learning approaches in price foresight on
an hourly interval for electricity, specifically long- and short-term memory networks that have
13
proven themselves effective in handling time series data. Electricity markets are complicated and
volatile, with a whole host of factors involved, such as weather, system loads, regulatory policies,
and fickle market dynamics; thus, an electricity price forecasting model must be able to embody
those phenomenally complex relationships and to be adaptive to changes that continuously occur
in big datasets that it handles.
Issues such as data availability or data quality must be considered within serious dilemmas to be
resolved even before the very initiation of the machine-learning-based diabetes prediction
modelling. The small and even sparse availability of data accepted widely is considered one of
the greatest limitations. The PIDD is mostly used and consists of only 768 records. Albeit useful
in the early stages of the research, its small sample size hardly allows the model to generalize to
any larger target populations. When machine learning models are trained using very sparsely
available datasets, an overfitting is produced in which case they perform well on the training set
and poorly at predicting outcomes on a new unseen set.
A second challenge arise because of the imbalanced nature of the dataset: in other words, the
distribution of cases with and without the occurrence of type-2 diabetes is unevenly represented.
The majority class is afforded greater tendency in model adjustment and more incorrectly
presumed diabetic patients are left out of the identification loop owing to this class imbalance.
Various techniques to balance it out by researchers were used, such as oversampling, under-
sampling, etc. The other way to artificially generate proper synthetic data samples, so that the
minority class in the dataset has a better representation, is by methods like the Synthetic Minority
Over-Sampling Technique (SMOTE).
In the other instance, other hindered prepossessing tasks may arise from empty or inconsistent
values, particularly those associated with evaluating patients' metabolic health, such as insulin
levels and skin thickness. The missing data in this case may thus compromise consistency during
training and testing, which will later impact its predictive reliability. To remedy that, researchers
populated the missing values employing varying degrees of correlation with other variables
through predictive modeling, median substitution, and mean imputation. These adjustments to a
data set would increase its completeness; however, they need to be done with caution so as not to
introduce bias.
14
To add an extra effect on construct identity operation, factor selection works toward the
enhancement of the functioning of the model. Placing less importance on several different items
in favour of those that are really of major concern factors (blood glucose levels, body mass index,
age, and family history) for diabetes risk prediction. Narrowing its research down to the most
pertinent factors will help reduce dataset complexity, fast computation, and increase
interpretability of predictions. Several methods to assess which factors are most important to
accurate prediction are tree-based feature importance assessments, correlation analysis, and
Recursive Feature Elimination (RFE).
To adjust all parameters involved in improving the model performance, the researchers utilize
hyperparameter-tuning methods such as Grid Search and Random Search. These techniques
guarantee maximum accuracy tuning for the models.
Despite these obstructions in terms of data, machine-learning techniques for modelling diabetes
prediction continue to be improved. Ensemble combining techniques such as Random Forest and
Gradient Boosting have been used to assure accuracy on predictions that are taken more
confidently. Moving with far greater precision, Deep Learning techniques such as CNNs and
LSTM networks are now systemic realizing complex health data patterns independently. Such
advanced models provide an armchair approach to diagnosis with fewer constraints of manual
feature engineering; hence, the traits associated with classical datasets can be countered by
integrating real-time monitoring of patients along with multiple sources of medical datasets. The
prediction for diabetes has thus shifted from a retroactive evaluation to a proactive real-time
surveillance system thrust through wearable health gadgets monitoring the blood glucose and
physical activity levels. With this development in AI, it is imperative that machine learning
experts and medical researchers work in collaboration towards the definition of predictive
healthcare models for the coming generation. Given advanced feature engineering, clever data
augmentation techniques, and enhanced deep learning methodologies, they can then harness
highly accurate model usage toward personalized healthcare solutions.
The following study could focus on adaptive learning models, which update predictions in real
time based on patterns and alterations in data pertaining to the patient. Primarily, these
advancements will focus on the processes of diabetes evaluation and management with improved
accuracy and utility. Machine learning interventional innovations will ensure enhanced
accessibility, accuracy, and proactiveness in health care, therefore empowering patients and
health care professionals to timely intervene for disease prevention and management.
15
3. METHODS
The Pima Indian Diabetes Dataset was used, which is a commonly used dataset in the area of
machine learning applications in health. The full dataset has 768 records from individual persons
with eight important clinical details like age, skinfold thickness, blood pressure, insulin level,
number of pregnancies, Body Mass Index (BMI), and family diabetes history.
For each entry, there is also an outcome that merely signifies if the individual was diagnosed with
diabetes or not. We divided the dataset in the ratio of 80 to 20 to make sure that while learning,
our models discovered the features efficiently and generalized well for the new data.
Each entry is also attached to an outcome that says either the person has been diagnosed by
diabetes or does not have diabetes. We divided the entire dataset into two, that is 80%, which we
trained on, and a mere 20% on the test so that it proved its discovery efficiency and
generalizability to new data.
We spent some time on data preprocessing before moving onto building the model. These include
handling the missing or zero values, feature scaling, normalizing the data, and all those things
that help to bring the data to an equal footing. This step was important to make sure that our
models learned on consistent, clean data and, therefore gave rise to predictions that would most
likely be trustworthy and accurate.
This table 1 represents a dataset description, detailing the number of pregnancies, glucose levels,
blood pressure, skin thickness, insulin levels, BMI, diabetes pedigree function, age, and outcome for
five individuals. The data is organized into columns, with each row representing a single individual.
16
Figure 1: Dataset Distribution
In figure 1 the pie chart illustrates the distribution of various factors contributing to diabetes, with
glucose being the most significant factor at 30.97%, followed by BMI at 20.16%, and age at 16.13%.
The remaining factors, including pregnancies, blood pressure, skin thickness, insulin, diabetes pedigree
function, and outcome, contribute smaller percentages to the overall distribution.
In figure 2 the flowchart outlines the system workflow for a diabetes prediction model,
comprising six stages: data collection, data preprocessing, feature selection, model training,
hyperparameter tuning, and model evaluation. The workflow is designed to determine whether
17
the model achieves diabetes prediction correctness, with a "yes" indicating success and a "no"
requiring further hyperparameter tuning.
3.3 Methods
Data Preprocessing
Errors were investigated thoroughly prior to training models with the raw PIDD. Multiple
variables included blood pressure, glucose levels, skin thickness, and insulin numbers which
indicated zeros that are unreasonable in a medical context. These were treated as missing values
either by substituting mean/median values or treated through imputation techniques.
Standardisation was then used to ensure that all features were brought to a common scale to
prevent any one attribute from skewing results and otherwise serious affecting the model.
Changes to Features
When all features were standardized, smaller variables were not overshadowed by larger ones,
allowing the model to perform its best. This balanced scaling ensured that each feature was
interpreted fairly without distorting eventual outcomes. The dataset's entire numerical nature
called for no categorical encoding, thus streamlining preprocessing while preserving data
integrity.
We tested four different machine learning algorithms in order to develop a robust diabetes
prediction system:
Logistic regression for its straightforward implementation, good predictive power, and easy
interpretation to identify diabetes risk factors.
Decision Tree- A very helpful tool for probing into the patterns generated hierarchically in the
decision-making process of the dataset.
Random Forest - An ensemble method used to boost the accuracy and stability of prediction.
XGBoost- Best known for its extreme classification accuracy, regularization features, and
phenomenal handling of structured data.
18
Each model trained in 80% of the dataset was 20% kept aside to test how it performs in real-life
situations.
Model Evaluation
To evaluate the performance of our models, we took into consideration several evaluation metrics:
Accuracy: Measures the proportion of correct predictions over the entire dataset, thus giving an
overall measure of trust in the model.
Precision: Indicates how many of the predicted positive cases were actually correct to ensure that
the model does not label too many instances falsely.
F1-Score: The F1-Score represents a trade-off between precision and recall, and is therefore
useful for situations of an imbalanced dataset.
4. RESULTS
The technical specifications used in the project are talked about in the results and output section.
Moreover, the section describes variables changed or unmodified for the experiments or
simulations. Finally, within this section, one finds experimental results together with statistical
data or examples demonstrating the efficacy or functionality of the system.
"Output and results" will talk about the technical specifications used in the project. Apart from
that, it can include the variables that were modified or unmodified in conducting
experiments/simulations. Eventually, experimental results will be found within this section as
well, including statistical information or examples that demonstrate the system's efficacy or
functionality.
Table 2: Results
The table presents the results of four machine learning algorithms: Logistic Regression, Decision
Tree, Random Forest, and XGBoost. The metrics used to evaluate these algorithms are Accuracy,
Precision, and F1 Score.
Change in healthcare future, that is beyond technology. AI researchers will work hand in hand
with medical practitioners in future healthcare. Medical systems, when combined with
computational tool and clinical expertise, devise treatment plans that can be tailored to suit the
individual risk factor of a patient. AI diagnostics has the potential of reducing healthcare costs,
improving patient outcomes, and making preventive care accessible as the tide of scourge
captures more ground. It is not just that machine learning has improved the prediction of diabetes;
it is building a system of responsive health care focused on early detection, prevention, and
personalized strategy development on treatment."
The idea is demonstrated: Predictive algorithms will increase precision in diagnosis by facilitating
early detection. Hereby, the vast possibilities of machine learning are clearly shown in diabetes
prediction. Including these models in health monitoring devices will help motivate people to take
proactive efforts to better manage their health as well as become timely informed.
20
Indeed, improvements in disease prediction have been made possible by the remarkable capacity
of XGBoost, underscoring the need for complex ensemble methods in health care applications.
Synergy between data scientists and doctors in practice is going to bring about a lot of
technological advances as AI advances in personification, which promises more accuracy,
flexibility, and prevention in the healthcare industry.
Figure 3 represents the logistic regression model which shows a moderate performance in
predicting diabetes risk, with an AUC score of 0.73. The confusion matrix reveals that the
model correctly classified 30 true negatives and 14 true positives but misclassified 20 false
negatives and 36 false positives. This indicates that while the model is good at identifying non-
diabetic cases, it struggles to detect diabetic patients accurately.
21
Figure 5: Evaluation of Random_Forest
Figure 5 represents the random forest model which exhibits a slightly lower performance
compared to the decision tree, with an AUC score of 0.85. The confusion matrix indicates that
the model correctly classified 32 true negatives and 25 true positives, with 8 false negatives and
35 false positives. Although the random forest model is still effective, it is not as accurate as the
decision tree model in predicting diabetes risk
Figure 6 represents the XGBoost model which shows the best performance among all models,
with an AUC score of 0.91. The confusion matrix reveals that the model correctly classified 34
true negatives and 28 true positives, with only 6 false negatives and 32 false positives. This
indicates that the XGBoost model is highly effective in predicting diabetes risk and outperforms
the other models.
The visuals show how well the XGBoost and logistic regression models predict the risk of
diabetes. Mean ROC curves and confusion matrices which help to assess how well each model
discriminates between diabetic and nondidactic cases.
22
Confusion matrix shows the accuracy at which each model classified cases into positive and
negative categories. Logistic regression sometimes showed good performance, however, it
produced false-negative cases and hence failed identifying the true diabetes patients sometimes.
Superior results were obtained with XGBoost, which reduced misclassification and increased true
positive identification.
The ROC showed the ability of the model to differentiate between cases with and without
diabetes. The higher AUC score of XGBoost as compared to logistic regression implies that it
will have better ability in producing accurate predictions.
In all consideration, comparison sets forth the merits of using more advanced machine learning
methods, like XGBoost, for making clinical prognostications. Simpler models can provide useful
insights; however, ensemble methods and elaborate algorithms often have an improved accuracy
and are therefore more suitable for medical applications. The outcomes underscore how important
it is to determine the best model for predicting disease if the results are to be accurate enough to
enable earlier intervention and better outcomes for patients.
5. CONCLUSION
Through machine learning and artificial intelligence, the disorders, along with diabetes, could
be detected much more efficiently at an early stage, and their prediction could then be made for
diagnosis in a much easier manner. Traditional statistical methods were incapable of picking up
such subtle patterns acting as harbingers of serious disease, as the data in the field became more
complex. More importantly, machine learning would not only prospect down unearthed nuggets
from heaps of patient data but also facilitate exact diagnosis, not excluding proactive
management of the conditions. In addition to individual interventions, these strategies pave the
way for clinicians to build personalization in treatment schedules with significantly better long-
term outcomes in health through the prompt identification of warning signs, leading to
immediate and timely interventions.
Among the highly prevalent metabolic disorders affecting millions of people in the world today
is diabetes, which makes its detection something important to avert complications in the future.
Diagnosis has always depended upon biochemical evaluation-employing fasting glucose ranks
and levels of haemoglobin A1c. These tests are good, but they generally confirm diabetes only
after the illness is established. Machine-learning technology has enabled an advance into a de
novo preclinical management of diabetes now, before symptoms appear. These models can
predict at-risk populations using such major health parameters as glucose levels, body mass
23
index, age, insulin resistance, and family history, thus prompting individual risk re-evaluation
and accessing early healthcare for prevention.
In this study, we examined the various machine learning models and their efficiencies of
predicting the diabetes risk. All four models, i.e., XGBoost, Decision Tree, Random Forest, and
Logistic Regression, presented their merits, but at the same time had disadvantages. They did
reasonably well; however, rules such as Decision Tree and Logistic Regression could not cope
with identifying complex, nonlinear patterns. Logistic regression assumes a proportional, direct
relationship between features and outcomes in medical conditions such as diabetes, which is not
necessarily true. On one hand, these models can deal well-with-nonlinearities; on the other hand,
majority of decision trees generally tend to overfit, meaning that they predict a training dataset
very well but fail to generalize to unseen patients, thus justifying their poor prediction ability to
newer cases. Therefore, this often leads to a low precision and F1-score on the unseen cases.
Refining prediction accuracy is based on the ensemble methods, such as Random Forest, where
multiple Decision Trees were combined to yield more reliable results in classification. The
averaging of the results among trees reduces the overfitting caused by overly-concretized
predictions. The performance of XGBoost however, made the best impression during the whole
modelling trials in terms of holding higher accuracy and F1 scores with all the models tested.
Especially, it shines in diabetes prediction, as it is well-structured to handle unbalanced datasets
in the medical field. Unlike the classical classifiers based on the gradient boosting technique, it
iteratively improves weak models to achieve improved stronger accuracy. The strong focus on
misclassified instances and giving these instances importance during the learning procedure is
one of its major benefits and guarantees that the particular challenging instances that might have
very subtle first symptoms of diabetes receive greater attention.
Hyperparameter tuning also seemed to play a part in XGBoost outperforming the others.
Overfitting became our point of concern because we had high hopes with the datasets that
contained training data but poor performance in generalization to newly available samples.
Overfitting occurs when a model essentially "fixes" on the peculiarities of one dataset rather
than learning the trends that can generally be applied to larger and diverse population aggregates.
Thus, we best tried to balance the trade-off between complexity and generalizability with respect
to the hyperparameters of the learning rate, tree depth, and number of estimators. By using
regularization techniques in the form of either L1 or L2, we constrained overly complex models
from attaching much importance to redundant or noisy features. This established a system of
high accuracy for predicting diabetes in real-time scenarios.
24
If improvements are achieved in models and applied on EMR devices, it is likely that the future
will see an innovative approach to diabetes care transformed. Thus, machine learning could
serve as the most important bridge between medical diagnostics and personalized healthcare
enabling early disease detection that becomes much more feasible and actionable. AI-enabled
approaches that provide quick data to patients and care providers may enable sound, time-
sensitive decisions resulting in improved outcomes.
.6. REFERENCES
[4] Kaur,S, & Kumar,R.(2020). A review on predictive models for diabetes using
machine learning techniques. International Journal of Engineering Research and
Applications, 10(6), 1-6. doi:10.35629/7728-10060106.
25
7. APPENDICES
The dataset used in this study contains key medical features related to diabetes diagnosis. It
consists of records, each representing an individual with various clinical attributes that contribute
to diabetes risk assessment. Below is a description of the features used in the study:
The dataset was pre-processed before training the models, including handling missing values,
normalizing continuous features, and performing feature selection to enhance classification
accuracy.
Various machine learning algorithms were implemented and tuned to improve diabetes prediction.
Below is a summary of the models used and key hyperparameters adjusted:
26
XGBoost Learning Rate, Max Depth, Number of Estimators
Hyperparameter tuning was performed using Grid Search and Random Search techniques to
optimize model performance while reducing overfit.
The system setup used for model training and evaluation was as follows:
This setup ensured efficient computation, enabling seamless processing of large datasets and model
training tasks.
To assess model effectiveness, standard evaluation metrics were used, providing insights into
accuracy and reliability. Below is a summary of model performance:
27
8. Reflection of the Team Members on The Project
In completing the assignment. We improved our abilities in scientific writing, teamwork, and time
management. The idea of doing both the research component and the write-up initially made my
partners and I feel overwhelmed, but that sensation quickly vanished as we came up with a strong
plan. We initially felt extremely worried for a number of reasons, including the realization that our
most recent assignment, our study, and our work were insufficient.
Additionally, this initiative encouraged us to improve our communication skills. We had to change
our first approach to teamwork, where we would divide the work and merely examine each other's
contributions, to one where we would complete each component individually before combining them
to create the best final product.
This was a long-term design project, and everything was new for us. In our opinion, the most
important and necessary component for such a lengthy work was patience, communicating with
other team members and our supervisor on a regular basis. As the knowledge and information of
our supervisor helped us a lot in the process of this project.
On the other hand, the primary research was the most difficult and time taking sector for us. It was
difficult to keep up with the submission dates as all the concepts were new to us. To conclude, we
overcame all the difficulties as a team and under the supervision of our supervisor we completed
our project on time. By learning new concepts and gained experience as well.
28
qwer
ORIGINALITY REPORT
6 %
SIMILARITY INDEX
4%
INTERNET SOURCES
4%
PUBLICATIONS
2%
STUDENT PAPERS
PRIMARY SOURCES
1
www.mdpi.com
Internet Source 1%
2
V. Sharmila, S. Kannadhasan, A. Rajiv Kannan,
P. Sivakumar, V. Vennila. "Challenges in
<1%
Information, Communication and Computing
Technology", CRC Press, 2024
Publication
3
Submitted to HELP UNIVERSITY
Student Paper <1%
4
Thangaprakash Sengodan, Sanjay Misra, M
Murugappan. "Advances in Electrical and
<1%
Computer Technologies", CRC Press, 2025
Publication
5
Sharbari Sarkar, Soumyabrata Saha, Suparna
DasGupta, Sudarshan Nath. "Chapter 5
<1%
Diabetes Syndrome Prophecy Using Machine
Learning", Springer Science and Business
Media LLC, 2023
Publication
6
www.scirp.org
Internet Source <1%
7
Biswajit Jena, Sanjay Saxena, Sudip Paul.
"Machine Learning for Neurodegenerative
<1%
Disorders - Advancements and Applications",
CRC Press, 2025
Publication
8
koreascience.kr
Internet Source <1%
9
doaj.org
Internet Source
<1%
10
www.e3s-conferences.org
Internet Source <1%
11
Submitted to Cork Institute of Technology
Student Paper <1%
12
www.irjet.net
Internet Source <1%
13
dipot.ulb.ac.be
Internet Source <1%
14
Submitted to University of Salford
Student Paper <1%
15
Arvind Dagur, Karan Singh, Pawan Singh
Mehra, Dhirendra Kumar Shukla. "Intelligent
<1%
Computing and Communication Techniques -
Volume 3", CRC Press, 2025
Publication
16
ijarsct.co.in
Internet Source <1%
17
www.frontiersin.org
Internet Source <1%
18
Submitted to Athlone Institute of Technology
Student Paper <1%
19
arxiv.org
Internet Source <1%
20
link.springer.com
Internet Source <1%
21
openjournal.unpam.ac.id
Internet Source <1%
22
www.ijera.com
Internet Source <1%
23
www.researchgate.net
Internet Source <1%
24
Ahmed A. Abd El-Latif, Mohammed A
ElAffendi, Mohamed Ali AlShara, Yassine
<1%
Maleh. "Cybersecurity, Cybercrimes, and
Smart Emerging Technologies", CRC Press,
2025
Publication
25
ratevideo.ru
Internet Source <1%
26
Submitted to Australian National University
Student Paper <1%
27
www.ijfmr.com
Internet Source <1%
28
hrcak.srce.hr
Internet Source <1%
29
mdpi-res.com
Internet Source <1%
30
www.ijisrt.com
Internet Source <1%