FinalReport Ajay
FinalReport Ajay
A PROJECT REPORT
Submitted by
Sahil Ansari
Shayan Azeem
Sahil Khan
of
MAY 2025
CERTIFICATE
Certified that this project report “Lifestyle Diseases Using Health Data” is the
Bonafide work of “Sahil Ansari, Shayan Azeem, Sahil Khan” who carried out
Certified that this project report “Predicting Lifestyle Diseases Using Health Data”
is the Bonafide work of “Sahil Ansari, Shayan Azeem, Sahil Khan” who have
“We hereby declare that this submission is my own work and that, to the best of my
knowledge and belief, it contains no material previously published or written by
another person nor material which has been accepted for the award of any other
degree or diploma of the university or other institute of higher learning, except
where due acknowledgment has been made in the text”.
We, Anjali Sinha, Ajay Kumar, Amresh Kumar Chaurasiya, and Abhishek Tiwari,
pursuing B.C.A, would like to express our sincere gratitude to all those who
supported and guided us throughout the completion of this Project Report. First
and foremost, we would like to extend our heartfelt thanks to Dr. Mohd Faizan
for his valuable guidance and continuous encouragement throughout the course of
this project. His insights and suggestions were instrumental in shaping the
We are also deeply thankful to Mr. Obaidullah, our Project Lab Instructor, for
understanding and preparing this Project Lab Report. His support played a crucial
Last but not the least, we would like to thank all our colleagues for their cooperation,
Sahil Ansari
Shayan Azeem
Sahil Khan
TABLE OF CONTENTS
ABSTRACT iii
LIST OF TABLES v
LIST O F FIGURES vi
1. INTRODUCTION 1
1.1 Overview of lifestyle Diseases 1
1.2 Industry Context 1
1.3 Background and Motivation 4
1.4 Objectives of the Study 5
1.5 Scope of the Project 6
1.6 Significance of Predictive Analytics in Healthcare 7
1.7 Challenges in Traditional Diagnosis 7
1.8 Relevance of Machine Learning in 9
Medical Prediction
4. Requirement Analysis 34
4.1 User Requirements 35
4.2. Functional Requirements 36
4.3 Non-Functional Requirements 36
4.4 System Requirements (Hardware & Software) 24
5. Project Description 37
5.1 Description 37
5.1.1 What we are proposing? 38
5.1.2 Key Features of the Proposed System 38
5.1.2 Expected Outcome 38
5.2. Methodology 39
5.2.1 Define the Problem 39
5.2.2 Data Collection 40
5.2.2 Data Preprocessing 41
5.2.3 Data Exploration 42
5.2.4 Model Building 45
5.2.4.1 Model 1 45
5.2.4.2 Model 2 47
5.2.4.3 Model 3 47
5.2.5 Model Evaluation 48
5.2.6 Model Deployment 49
5.3 Project Timeline 50
5.4. Major Results 51
5.4.1 General Disease Prediction Model Results 52
5.4.2 Heart Disease Prediction Model Results 53
5.4.2 Diabetes Prediction Model Results 54
5.5 Application 59
5.6 Conclusion 57
6. Design 59
6.1 Context Diagram 59
6.2 Data Flow Diagram 61
6.3 Flow Chart 64
6.4 Snapshots of the Project 66
6.5 Dataset and Tables 72
6.5.1 Model 1 72
6.5.2 Model 2 74
6.5.3 Model 3 77
8. References 87
Abstract
Predictive Healthcare Analytics using Machine Learning is an innovative approach that leverages
data-driven techniques to analyze medical information and predict potential health risks. As
healthcare systems generate vast amounts of data, it becomes crucial to utilize advanced
computational methods to extract meaningful insights for early diagnosis and disease prevention.
Machine learning, a subset of artificial intelligence, provides powerful tools to enhance medical
decision-making and improve patient outcomes. This project focuses on developing a predictive
analytics system capable of forecasting diseases such as heart disease and diabetes based on
patient symptoms, medical history, and laboratory test results. By utilizing advanced algorithms
like Logistic Regression, Random Forest, and Support Vector Machines (SVM), the system
identifies patterns, correlations, and trends in historical healthcare data, enabling early detection
and personalized recommendations. The system aims to bridge the gap between data driven
insights and practical medical applications, empowering both individuals and healthcare
professionals.
The project follows a structured methodology, including data collection from electronic
health records (EHRs), public datasets, and real-world medical repositories. The data undergoes
highquality inputs for model training. Machine learning techniques are then applied to train and
evaluate predictive models, assessing their accuracy and reliability using key performance
metrics such as precision, recall, and F1-score. The integration of a user-friendly frontend,
developed using Streamlit, allows individuals to input their medical data and receive AI-driven
iii
The significance of predictive healthcare analytics lies in its ability to shift the healthcare
paradigm from reactive treatment to proactive prevention. The developed system not only aids in
disease prediction but also helps optimize hospital resources, reduce medical costs, and facilitate
early medical intervention. Moreover, the system has the potential to be expanded with real-time
monitoring capabilities, wearable device integration, and broader disease coverage, making it a
early warning systems, and ultimately improve patient care and health outcomes. By harnessing
the power of machine learning and AI, this project contributes to the ongoing efforts to
modernize healthcare and make it more accessible, efficient, and patient centric.
iv
LIST OF TABLES
Table No. Name Page No.
2 Risk Analysis 33
LIST OF IMAGES
v
1 Life Cycle of ML Model Creation and Deployment 40
2 Gannt Chart 50
9 Context Diagram 44
10 Data Flow 46
11 Flow Chart 49
12 Home Page 51
vi
16 Diabetes Prediction Page 55
vii
1. INTRODUCTION
1.1 Overview
The healthcare sector has entered a transformative phase, driven by the integration of artificial
diagnosis and treatment followed a reactive model—patients would seek medical assistance only
after symptoms became pronounced. However, this approach often results in delayed
interventions, higher treatment costs, and worsened patient outcomes. Predictive healthcare
analytics is changing that narrative. By leveraging large datasets and machine learning
algorithms, healthcare providers can forecast disease likelihood, identify high-risk patients, and
act preventively. This shift is not only improving patient care but also optimizing the use of
Predictive analytics has emerged as one of the most transformative technologies in the data-
driven era, influencing decision-making across a broad spectrum of industries. At its core,
predictive analytics involves using historical data, statistical algorithms, and machine learning
techniques to identify the likelihood of future outcomes. While its impact in healthcare is
I. Healthcare Industry
optimization, and personalized treatment plans. By analyzing patient histories, lifestyle factors,
genetic data, and real-time inputs from wearables, medical professionals can predict the onset of
1
chronic illnesses such as diabetes, heart disease, and cancer. Hospitals are increasingly
leveraging predictive models to anticipate patient admissions, manage ICU capacities, and
forecast medication demands, thereby enhancing both clinical and operational efficiency.
The financial industry was among the early adopters of predictive analytics. Banks and financial
disbursement, and predict customer churn. Real-time transaction monitoring combined with
predictive models helps detect anomalies and reduce financial risks. Moreover, portfolio
management and stock market predictions are increasingly powered by complex predictive
algorithms that analyze market trends, economic indicators, and geopolitical factors.
In retail, predictive analytics helps businesses anticipate customer needs, personalize shopping
experiences, and optimize inventory. By analyzing purchasing behaviour, browsing patterns, and
preferences. Additionally, predictive models assist in demand forecasting, dynamic pricing, and
supply chain management, ensuring that businesses stay agile in a highly competitive market.
IV. Manufacturing
maintenance schedules, companies now use sensor data and machine learning to predict
equipment failures before they happen. This minimizes downtime, reduces operational costs, and
2
Telecom providers use predictive analytics to enhance customer experience, prevent churn, and
optimize network performance. By analyzing usage patterns and service feedback, companies
can identify customers at risk of switching to competitors and offer targeted retention campaigns.
Predictive modelling also supports infrastructure planning by forecasting network traffic and
VI. Education
personalized learning paths, and allocate resources more effectively. By monitoring academic
performance, attendance, and engagement data, educational institutions can intervene early and
Energy companies apply predictive analytics to optimize grid performance, manage energy
demand, and reduce outages. Smart meters and IoT sensors provide real-time data that, when
analyzed, helps in predicting consumption patterns, identifying system faults, and promoting
energy efficiency. Additionally, renewable energy sources benefit from predictive weather
Predictive analytics enhances fleet management, route optimization, and delivery forecasting in
logistics. It plays a crucial role in improving punctuality, reducing fuel consumption, and
anticipating maintenance needs. Airlines use predictive models to optimize ticket pricing,
3
IX. Insurance
The insurance industry uses predictive analytics for risk assessment, claim management, and
fraud detection. By evaluating policyholder behaviour, claim history, and third-party data,
insurers can design customized policies, adjust premiums dynamically, and prevent fraudulent
activities.
Governments are leveraging predictive analytics for resource planning, crime prevention, tax
fraud detection, and public health monitoring. Law enforcement agencies use crime data to
identify hotspots and predict criminal activity, while urban planners use traffic and demographic
The global rise in chronic diseases such as diabetes, heart disease, and cancer has placed a
significant burden on health infrastructure and economics. Many of these conditions develop
gradually and remain asymptomatic during early stages, making early detection a critical factor
in successful treatment. Unfortunately, traditional diagnostics often fall short in detecting such
conditions promptly due to reliance on manual interpretation of data, limited access to advanced
Moreover, modern healthcare generates an enormous volume of structured and unstructured data
—from electronic health records (EHRs) to wearable sensor outputs. Yet, only a fraction of this
data is utilized effectively in clinical decision-making. Machine learning offers the tools needed
to process this data, extract patterns, and support accurate, timely, and personalized medical
predictions.
4
The motivation behind this project stems from the need to bridge this gap between data and
diagnostics. With the right algorithms and enough historical data, we can build systems that not
only predict diseases before they occur but also tailor healthcare delivery based on individual
risk profiles.
The objective of this project is to develop a machine learning-based predictive analytics system
for healthcare that can analyze patient symptoms and lab test reports to predict the likelihood of
diseases such as heart disease, diabetes, and other medical conditions. With the increasing
prevalence of chronic diseases, early detection plays a crucial role in preventing severe
algorithms, the system is designed to assist both individuals and healthcare professionals in
identifying health risks early, thereby enabling timely intervention, reducing the chances of
disease progression, minimizing treatment costs, and ultimately enhancing patient care.
One of the key goals of this project is to bridge the gap between data-driven healthcare analytics
and user accessibility. The system will allow users to input their medical details, including
symptoms, lifestyle factors, and laboratory test results, and receive instant, AI-driven predictions
regarding potential health risks. The model will be trained on high-quality, diverse medical
datasets, ensuring accuracy and reliability in its predictions. This project is particularly beneficial
for:
• Patients: Those who are experiencing symptoms and want to assess their potential
5
• Medical Practitioners: Doctors and healthcare professionals can use this system as a
• Healthcare Institutions: Hospitals and clinics can integrate this system into their
diagnostic workflow, allowing for faster screenings and better allocation of medical
resources.
By making predictive healthcare accessible and user-friendly, this project aims to empower
individuals with data-driven insights into their health conditions. Future enhancements can
include integration with wearable devices, real-time monitoring, and personalized health
The scope of this study is broad, encompassing multiple machine learning models applied to
• Processing structured health data (age, gender, glucose levels, blood pressure, cholesterol,
etc.).
6
1.6 Significance of Predictive Analytics in Healthcare
Predictive analytics brings immense potential to modern healthcare. Its benefits include:
In developing countries or rural areas where access to doctors or testing is limited, an intelligent
The conventional healthcare system has long been centered around reactive care, where medical
intervention is typically initiated only after the onset of symptoms or the diagnosis of a disease.
While this model has been functional for decades, it poses several critical challenges, especially
in the face of rapidly evolving medical needs and population health trends.
One of the primary drawbacks of traditional diagnostic methods is delayed detection. Many
lifethreatening conditions, such as cardiovascular disease, diabetes, and certain forms of cancer,
often remain asymptomatic in their early stages. By the time symptoms become apparent and a
diagnosis is made through routine checkups or patient complaints, the disease may have already
progressed to an advanced stage. This delay in diagnosis significantly reduces the efficacy of
Furthermore, traditional diagnostics often rely heavily on the expertise and intuition of medical
professionals, which, although invaluable, is also subject to human error, cognitive biases, and
7
variability in clinical judgment. For instance, two doctors might interpret the same set of
symptoms or lab results differently, leading to inconsistent diagnoses and treatment paths.
Another challenge lies in the fragmentation and underutilization of available patient data. Most
healthcare systems collect a vast amount of patient data, including laboratory tests, imaging,
family history, lifestyle habits, and electronic health records. However, traditional diagnostic
approaches are not equipped to process and analyze such large volumes of multidimensional data
effectively. As a result, subtle indicators or complex patterns that could signal the onset of
Moreover, diagnostic tools and procedures such as MRIs, CT scans, or invasive biopsies are not
only expensive but also time-consuming and, in some cases, carry potential health risks. These
tools may not be accessible in remote or under-resourced areas, leading to significant disparities
in healthcare access and quality. In such settings, patients may forego early diagnostic screenings
There’s also the challenge of resource allocation and system strain. In emergency rooms or
during seasonal spikes in illnesses, healthcare providers often face enormous workloads, leaving
less time and resources for thorough diagnostics. The absence of decision support tools in such
The growing burden of chronic diseases, coupled with aging populations and increasing
healthcare costs, has highlighted the urgent need for a more proactive, data-driven approach to
diagnostics and patient management. This is where predictive analytics, powered by machine
By leveraging historical data and identifying patterns that precede disease onset, predictive
models can alert healthcare providers to potential risks before symptoms emerge, enabling timely
8
intervention, preventive care, and better management of chronic conditions. This approach also
helps in personalizing treatment plans, minimizing unnecessary tests, and improving overall
system efficiency.
In summary, while traditional diagnostic methods have formed the bedrock of modern medicine,
their inherent limitations in early detection, data handling, and scalability necessitate a paradigm
shift.
machine learning and predictive analytics. Traditional healthcare models have primarily relied on
reactive treatment approaches, where diseases are diagnosed and managed only after symptoms
appear. However, with the advent of advanced data-driven technologies, there is a paradigm shift
towards proactive and preventive healthcare. Predictive healthcare analytics enables the early
detection of diseases, risk assessment, and timely interventions, ultimately improving patient
volumes of structured and unstructured medical data, including patient history, laboratory test
results, and symptoms, to identify patterns and correlations that may not be evident through
traditional analysis. By leveraging predictive analytics, healthcare providers can make informed
decisions about patient care, optimize resource allocation, and reduce healthcare costs.
forecasting diseases such as heart disease and diabetes. The system utilizes machine learning
9
algorithms like Logistic Regression and Random Forest to analyze historical patient data and
generate accurate disease predictions. The integration of electronic health records (EHRs) and
publicly available medical datasets ensures a robust and diverse data foundation, improving the
system’s reliability. The significance of predictive healthcare analytics extends beyond individual
patient care. It plays a crucial role in public health planning, early epidemic detection, and
managing chronic diseases. By identifying at-risk populations and potential disease outbreaks,
healthcare organizations can take proactive measures to prevent widespread health crises.
Moreover, advancements in wearable technology and real-time data collection open new
10
2. Review of Previous work
Predictive healthcare analytics using machine learning (ML) is revolutionizing the healthcare
industry by enhancing patient care, operational efficiency, and cost management. By leveraging
ML algorithms, healthcare providers can analyse extensive datasets to identify trends and predict
future health outcomes, enabling personalized care and effective disease management. This
approach is pivotal in early diagnosis, disease prevention, and the development of personalized
treatment plans, ultimately leading to improved patient outcomes and optimized healthcare
workflows. The integration of classic optimization methods with ML further enhances the
accuracy and clinical relevance of predictive analytics, offering a comprehensive framework for
healthcare decision-making
To identify the relevant studies, we have used reputed sources: Google Scholar, IEEE Xplore,
and Web of Science. For searching relevant studies, we used the search string: “Predictive
Healthcare Analytics Using Machine Learning”. Then from the search result, we tried to identify
the papers having one of the following keywords: ‘‘Disease Prediction using Machine
Data”. Table 1 consist of all those papers that we have used as our reference.
11
Table: 1 List of Research Paper in the Field of Predictive Healthcare Analytics using
AI/ML
No. Author Name Methodology Limitation Ref.
1 Obeagu, Big data The paper does not • Lack of Specific Methodological [6]
Ogenyi, and and methodologies used Since the paper does not discuss
different conditions.
•
Generalization of Machine
Learning Models:
12
The applications discussed in the
findings.
13
•
14
2 Jiang Ping Heart Disease Machine learning • Computational Complexity: [7]
Haq, Salah Method Using LR, K-NN, ANN, (especially ANN and SVM) can be
proposed a novel
15
selection. The system with different demographic
is tested on characteristics, making it less
Cleveland heart applicable outside the dataset it
disease dataset. was trained on.
researchers or practitioners to
applications.
• Overemphasis on Future
Developments:
16
healthcare, while interesting, can be
17
speculative. Predictions about the
constraints or real-world
future.
World Evidence:
18
4 Shahid An ensemble In this research work, • Limited Consideration of Other [9]
Ganie, and Learning Type-II Diabetes The study focuses on lifestyle and
mellitus based different machine variables in the model could limit its
crossvalidation for
the
prediction of disease.
A detailed analysis of
patients’ lifestyle
for the
development of
19
phase plays an
important role in
better prediction by
assessment of the
of missing values,
detection and
replacing of outliers,
employed for
choosing the
optimum set of
lifestyle features.
20
5 Daniele Deep Learning The paper discusses • Black-box Nature [10]
deep learning
21
6 Riccardo Deep learning This paper mainly • Black-box Models [11]
for healthcare:
Miotto, Fei review, focuses on deep While deep learning models like RNNs
opportunities
with LSTMs and SDAs can offer
Wang, and challenges learning methods that
Shuang are used for powerful predictive capabilities, they
are often considered "black-box"
Wang, representation models. This lack of interpretability
makes it difficult to understand why
Xiaoqian learning. Models certain decisions or predictions are
made, which is especially concerning
Jiang and classify diagnoses in healthcare settings where clinicians
Joel T. from clinical
Dudley measurements in
22
with LSTM identify need to trust the model's output and
detect physiological
automated decisions.
23
prediction using
24
Sharmin models for machine learning. data). While these categories are useful
Akter, mental illness Initially they started for organizing the research, they may
using machine
Ferdaus learning with around 780 overlook other potentially valuable
algorithms
Anam records collected data sources, such as genomic data,
• Audio data
• Sensor and
device data
• Multi-modal
data
25
26
8 Min Chen, Disease This research • Evaluation Metrics [13]
Yixue Haoi, Prediction proposes a new The paper mentions the prediction
and Lin Big Data From risk prediction (CNN- evaluation metrics, such as precision,
prediction accuracy
27
prediction may affect the model's performance in
(CNNUDRP) predicting rare diseases or conditions.
algorithm.
28
9 Stephen S. Using Machine This research paper • High-Dimensional Feature Set [14]
Chia-Wen Applied surgery. PLP models may cause the model to overfit to the
29
First, the OHDSI PLP The automatic generation of code to
(CDM).
Vocabulary20—a set
of standard clinical
taxonomies for
diagnosis codes,
medications,
observations, and so
on (eg, SNOMED,
LOINC, RxNorm)—
to automatically
generate a very
highdimensional
feature set of
candidate
30
31
predictors (often
of thousands of
the collection of
observed diagnoses,
medications,
observations, and so
is being trained.
framework and
software generate a
which is portable
from researcher to
researcher, to
facilitate efficient
replication of the PLP
32
model and
minimization of
reproducibility errors.
33
10 Mohammed Healthcare This paper aims to • Lack of Original Data or [15]
predictive
Badawy, analytics using present a Experimentation
machine
Nagy learning and comprehensive The paper is a survey, meaning it
deep learning
Ramadan1 review of the most synthesizes and reviews existing
techniques: a
and survey significant ML and studies but does not present original
applying ML and DL
34
and thoroughly problems. It does mention general
trends or findings, but without side-
reviewed. The byside comparisons of these
techniques' effectiveness in different
reviewed studies have scenarios, it's difficult for the reader to
assess which methods are best for
shown that AI
specific use cases in healthcare.
techniques (ML and
role in
accurately diagnosing
to anticipate and
analyze healthcare
data by linking
hundreds of clinical
records and
rebuilding a patient’s
advances research in
predictive analytics
using ML and DL
approaches and
contributes to the
35
literature and future
studies by serving as a
academics and
researchers.
While many of the reviewed works demonstrate the effectiveness of machine learning in specific
areas of healthcare, several limitations persist. Some models are disease-specific, others rely
heavily on structured clinical environments, and very few address the need for a lightweight,
This project proposes a real-time, non-invasive predictive healthcare system that allows patients
to enter basic health data and symptoms, and receive immediate feedback about potential health
risks using trained machine learning models. The system is designed to be extendable, secure,
36
3. Problem Identification & Feasibility Study
3.1 Introduction
In the face of rapidly growing global healthcare demands, the need for intelligent and scalable
diagnostic tools has become more urgent than ever. Modern medicine, though equipped with
sophisticated tools, often struggles to provide timely interventions due to issues such as
misdiagnosis, underdiagnosis, and resource shortages. This chapter aims to identify the specific
problems addressed by this project and establish the technical, operational, and economic
physical examination, and follow-up tests. While effective in many cases, they suffer from
several limitations:
• Delay in Diagnosis: Chronic diseases like diabetes and cardiovascular conditions often
• Overburdened Healthcare Systems: With rising patient inflow and shortage of medical
• Manual Errors and Subjectivity: Human diagnosis is prone to error, especially when
• Resource Constraints: Advanced diagnostic equipment and lab tests are expensive or
These challenges point to a need for an automated, consistent, and accessible system that assists
healthcare providers and even patients in identifying potential health risks early.
37
3
.3 Research Gaps Identified
scalability.
• Data imbalance and lack of high-quality labeled datasets compromise model accuracy.
• Many systems do not provide a user interface for easy interaction by non-technical users.
This project addresses these gaps by building models for general diseases, heart disease, and
predictive interface.
In the context of the identified problems, this project has clear and actionable goals:
• Support both binary and multiclass classification (e.g., diabetic, non-diabetic and
prediabetic).
38
3
.5 Assumptions
To ensure the project is technically and logically feasible, the following assumptions are made:
1. Availability of Reliable Datasets: It is assumed that public datasets used for training and
3. Tool Accessibility: Tools like Python, Pandas, scikit-learn, and Matplotlib are available
4. Deployment Limitations: The project focuses on model building, not live deployment in
hospitals or mobile apps, though it lays the groundwork for such integration.
• Technology Stack: Python is chosen for its extensive libraries such as scikit-learn (ML),
Random Forests offer a strong foundation for both binary and multiclass classification.
• Data Availability: Datasets from sources like Kaggle and UCI Machine Learning
• Environment Setup: The project runs on local machines with standard hardware
(minimum
39
3
• Scalability: Models can be retrained on updated datasets, and additional diseases can be
complexity.
• Stakeholders: Can be used by medical researchers, doctors, healthcare startups, and even
• Development Cost: Since the tools and libraries used are open source, there is no cost for
software licenses.
40
3
.7 Risk Analysis
Data Quality Issues Medium High Apply thorough data cleaning and preprocessing
techniques.
hyperparameters.
Even though the model is not deployed in a live healthcare environment, the project considers:
No sensitive personal data is collected or used in the project, and ethical machine learning
41
4. Requirement Analysis
Requirement analysis serves as a crucial phase in the software development lifecycle. It bridges
the gap between the problem statement and solution architecture by identifying and documenting
what the system must accomplish. This chapter outlines the functional, non-functional, user, and
system requirements for the predictive healthcare analytics system being developed.
Understanding these requirements ensures the system will be efficient, user-friendly, and capable
The purpose of this project is to develop a machine learning-based application that can
predict the likelihood of specific diseases such as diabetes and heart disease based on user inputs
Our primary goal was to design a system that could be used by:
• And able to give clear explanations for the output like Accuracy, Precision, Recall, and so
on.
42
4.2. Functional Requirements
Here are the main functional requirements our system needed to fulfill:
• Input Handling:
The system should take user-provided symptoms as input. In the case of heart
data.
• Prediction Logic:
Based on the input, the system should use trained ML models like Random Forest
prediabetes likelihood.
• Performance Evaluation:
It must evaluate the prediction using confusion matrices and output key
Show results in a clear and visual format using graphs and labeled confusion
These are the qualities that make the system more usable and scalable:
• Accuracy:
43
The model should produce highly accurate results, ideally above 80%, to be
• Efficiency:
• Scalability:
The system should be scalable — meaning we should be able to easily add new
• User Experience:
• Software:
Streamlit
5. Project Description
5.1 Description
Several studies reviewed in the literature demonstrate the successful application of machine
learning in healthcare. For example, Ganie & Malik used ensemble methods for early Type-II
44
Diabetes detection based on lifestyle indicators, while Jiang et al. explored a broad range of
classifiers (LR, SVM, KNN) for heart disease prediction using Cleveland dataset. Other research
focused on deep learning techniques (e.g., CNNs, RNNs) for high-dimensional or real-time
medical data.
However, most of these studies are either disease-specific, complex to deploy, or lack a unified,
lightweight interface that could assist both patients and doctors in making instant, accessible
health predictions. Moreover, many reviewed systems rely on stored or clinical data, whereas
this project emphasizes a privacy-focused, non-storage approach, where users input symptoms in
Hence, our project proposes a predictive healthcare analytics system that combines ease of use,
The primary objective of this project is to design and develop a comprehensive health prediction
system using machine learning algorithms, capable of diagnosing general diseases, heart disease,
and diabetes based on various input features. In an era where healthcare accessibility and early
detection are paramount, such intelligent systems can play a vital role in saving lives by offering
• Uses pre-trained machine learning models to assess disease likelihood based on input.
45
• Supports predictions for multiple diseases like diabetes and heart disease, and can be
extended to more.
• Is built using open-source tools, making it cost-effective and scalable for academic,
• Real-time prediction using models like Logistic Regression and Random Forest.
This integrated machine learning-based diagnosis tool represents a step forward in digital
healthcare innovation. It not only supports early diagnosis and preventive care but also reduces
dependency on manual diagnosis, especially where medical infrastructure is limited. With further
with wearable health data—this project can evolve into a powerful decision-support tool for both
46
5.2 Methodology
The project follows a systematic approach comprising the following key stages:
The healthcare industry is facing numerous challenges that hinder the timely diagnosis and
treatment of diseases. These challenges include increasing patient volumes, limited resources,
rising healthcare costs, and a growing burden of chronic illnesses such as heart disease and
diabetes. Traditional medical diagnostic methods, which rely heavily on physical examinations,
patient interviews, and manual interpretation of lab results, are often reactive in nature. This
means diseases are typically diagnosed only after symptoms appear—sometimes too late for
effective intervention.
In this context, the problem can be clearly defined as the lack of early detection and predictive
• Delayed Diagnosis: Symptoms often manifest at later stages, which delays the diagnosis
• Increased Patient Burden: Patients suffer more both physically and financially due to a
Furthermore, these problems are exacerbated in under-resourced settings or regions with limited
47
Life Cycle of
Machine Learning
Model
The medical datasets used in this project were sourced from publicly available, reputable
healthcare repositories such as Kaggle, the UCI Machine Learning Repository, and standardized
Electronic Health Records (EHRs) from open-source medical research databases. These
platforms provide access to well-structured, anonymized datasets collected from real clinical
48
environments and health surveys, making them highly valuable for training predictive models in
healthcare.
These datasets comprise a wide variety of patient attributes that are critical to clinical diagnosis.
They typically include demographic information (such as age, sex), clinical symptoms (like chest
pain, fatigue, excessive thirst), and laboratory test results (including blood glucose levels,
cholesterol, blood pressure, and insulin readings). Some datasets also include target labels
indicating the presence or absence of a specific condition, such as diabetes or heart disease,
In order to build robust and generalizable prediction models, careful attention was given to the
selection of high-quality and diverse datasets. Datasets were assessed based on criteria such as
population groups. The inclusion of heterogeneous data points across gender, age groups, and
clinical indicators ensures that the trained models can effectively generalize across real-world
Moreover, the diversity and reliability of these datasets improve the statistical power and
predictive accuracy of the machine learning algorithms. Diverse datasets enable the model to
learn patterns associated with different risk factors and comorbidities, ultimately leading to more
By utilizing open-access datasets from trusted sources, the project also adheres to ethical data
usage standards, ensuring transparency, reproducibility, and the ability for other researchers to
49
5.2.2 Data Preprocessing:
The collected healthcare data undergoes a rigorous and methodical preprocessing phase to ensure
data quality, consistency, and suitability for machine learning models. This phase is crucial
because raw medical data often contains inconsistencies such as missing values, duplicate
records, unscaled numerical ranges, and categorical variables that machine learning algorithms
Firstly, missing values are identified and treated using appropriate imputation strategies. For
numerical fields such as blood pressure, glucose level, or BMI, statistical imputation methods
like mean, median imputation are applied. For categorical features (e.g., gender or chest pain
type), the mode or most frequent category is used for replacement. If certain records have
excessive missing data that compromises integrity, they are removed entirely to maintain dataset
quality.
Secondly, duplicate records, which can skew the model’s learning and lead to data leakage, are
detected using patient ID or identical row checks and removed from the dataset. Ensuring
uniqueness in patient entries helps improve the reliability and generalizability of the model.
Next, normalization and standardization are performed on numerical features. Features such as
age, cholesterol level, and glucose concentration are scaled using methods like Min-Max Scaling
or Z-score Standardization. This process ensures that all numeric values fall within the same
range, preventing models like Logistic Regression from being biased toward higher-magnitude
features.
For categorical variables, such as sex, chest pain type, or fasting blood sugar status, encoding
techniques are applied. Depending on the algorithm’s requirements, One-Hot Encoding or Label
50
Encoding is used to convert text labels into numerical format. This allows the machine learning
To further improve the efficiency and accuracy of the models, feature engineering techniques are
employed. One such method is Principal Component Analysis (PCA), a dimensionality reduction
technique that transforms correlated features into a smaller set of uncorrelated components,
preserving most of the dataset’s variability. PCA is particularly helpful in reducing overfitting
redundant features. A correlation matrix helps visualize the relationships between variables, and
features that exhibit multicollinearity (e.g., cholesterol and triglyceride levels) can be removed or
Together, these preprocessing and feature engineering techniques enhance model performance,
ensure clean and structured input, and significantly improve the interpretability and scalability of
Data exploration, also known as Exploratory Data Analysis (EDA), serves as a foundational step
in any data science or analytics workflow. It involves using statistical and visualization
techniques to examine the dataset thoroughly before diving into more complex modelling tasks.
51
In the context of healthcare analytics, EDA becomes especially crucial due to the sensitive,
diverse, and often complex nature of medical data. Healthcare datasets may contain missing
values, outliers, duplicates, or inconsistencies that can significantly impact the outcomes of
Through EDA, we gain insights into the distribution of variables such as patient age, blood
pressure, cholesterol levels, or glucose readings, helping them understand the range, central
tendencies, and spread of the data. It also aids in the detection of anomalies, such as unusually
high lab values or implausible timestamps, which could indicate data entry errors or rare clinical
events. Moreover, EDA reveals correlations between features, such as the relationship between
BMI and the risk of diabetes, which may inform feature selection or hypothesis generation.
Additionally, EDA uncovers the underlying structure of the data, such as grouping tendencies or
trends over time, which are particularly relevant in longitudinal healthcare studies or time-series
analysis. By performing these analyses, we become better equipped to make informed decisions
missing values—ensuring that the dataset is suitable for building reliable and accurate machine
learning models.
In summary, EDA is not just a preliminary step but a critical process that ensures data quality,
uncovers meaningful patterns, and guides the entire analytics pipeline—ultimately leading to
52
Various machine learning algorithms, including Logistic Regression, and Random Forest, were
explored and evaluated to determine their effectiveness in the context of healthcare predictions.
• Suitability for healthcare data: This included their ability to handle real-world clinical
datasets, which often contain missing values, imbalanced classes, and complex
professionals need to understand and trust the model's decision-making process. Models
that provide clear explanations for their predictions are highly preferred.
5.2.4.1 Model 1
The first part of the project focuses on predicting general diseases using a Random Forest
Classifier.
• Works well with high-dimensional, sparse, binary data (like encoded symptoms).
53
1. Support Vector Machine (SVM) o SVMs become very slow with
complexity.
o SVM is inherently binary; for multiple diseases, you'd need strategies like One-vs-
symptoms.
o With many binary symptoms, distances become less meaningful and noisy.
o Needs careful tuning of layers, learning rates, etc. o Not ideal for explainable medical
54
5.2.4.2 Model 2
In the second module, Logistic Regression was implemented to predict the presence or absence
of heart disease.
5.2.4.3 Model 3
The third component involves the prediction of diabetes using a Random Forest Classifier.
55
Why Not Use Other Algorithms?
1. Support Vector Machine o Poor performance on class 'P'
medical data
The predictive models were trained on the preprocessed dataset using stratified cross-validation
to ensure generalizability. To build robust and reliable predictive models, the cleaned and
preprocessed dataset was used for training. Stratified cross-validation was employed as the
validation technique. This method involves dividing the dataset into multiple folds in such a way
that each fold maintains the same class distribution as the original dataset. This is especially
useful for imbalanced datasets, as it ensures that each subset is representative of the entire data.
By validating the model on different folds, stratified cross-validation helps in assessing how well
the model generalizes to unseen data, thus reducing the risk of overfitting.
Performance metrics such as accuracy, precision, recall, and F1-score were used to assess each
model’s effectiveness. After training, the models were evaluated using a comprehensive set of
performance metrics. Accuracy measures the overall correctness of the model’s predictions.
Precision indicates how many of the predicted positive instances were actually positive, while
recall measures how many of the actual positive instances were correctly identified. The F1-
score, which is the harmonic mean of precision and recall, provides a balanced measure of both
metrics, especially useful when dealing with class imbalance. These metrics collectively provide
56
5.2.6 Model Deployment:
A user-friendly interface was developed using Streamlit to enable easy interaction with the
predictive system. To make the predictive healthcare system accessible to non-technical users, an
intuitive and responsive web interface was built using Streamlit, an open-source Python library
for creating data applications. Streamlit was chosen due to its simplicity, fast development
capabilities, and seamless integration with machine learning models. The interface is designed to
be clean and interactive, allowing users to engage with the system effortlessly, even without prior
technical knowledge.
Users can input their symptoms, test results, and lifestyle factors to receive real-time health risk
assessments. The platform includes input widgets through which users can enter personal health
information. This may include symptoms (e.g., chest pain, shortness of breath), and diagnostic
test results (e.g., blood pressure, glucose levels). Once the user submits the data, it is instantly
fed into the trained machine learning model, which processes the input and generates a real-time
prediction.
The development of this project was structured into various phases, as shown in Table 2.
57
Table:3 Duration of Development Process
Phase Duration
The successful outcome of this project lies in the development of a functional and efficient
predictive healthcare analytics system that leverages machine learning algorithms to identify the
58
likelihood of diseases such as diabetes and heart disease. By allowing users to input basic health
parameters and symptoms through a simple interface, the system provides immediate predictions
The machine learning models used in the project demonstrated consistent and reliable
performance, with each model accurately recognizing patterns in patient data and delivering
meaningful predictions. The system's design ensures that the results are easy to interpret for both
Furthermore, the approach emphasizes real-time processing and privacy, as no personal data is
Overall, the project validates the feasibility of using lightweight, accessible machine learning
59
Fig:3 Confusion Matrix of Top 10 Disease
60
Fig:4 Performance Matrix of General Disease Prediction Model
61
Table:4 Heart Disease Confusion Matrix
62
5.4.3 Diabetes Prediction Model Results
N P Y
63
Fig:8
5.5 Application
• Symptom-based Disease Prediction: Users can input symptoms or lab test results, and the
• Preliminary Health Assessment: Individuals can use the system to assess their health
consultations.
64
• Reducing Unnecessary Hospital Visits: By providing preliminary guidance, the system
helps minimize unnecessary visits to hospitals and clinics, reducing medical costs and optimizing
healthcare resources.
• Chronic Disease Risk Assessment: The model can evaluate long-term health risks and
predict the likelihood of chronic diseases such as diabetes and heart disease based on patient
• Decision Support for Healthcare Providers: Doctors and medical practitioners can use the
system as an assistive tool for preliminary diagnosis, helping streamline patient triage and risk
assessment.
monitoring by integrating with wearable health devices, improving personalized healthcare and
• Health Awareness and Preventive Measures: The system can educate users about
potential health risks, provide recommendations for lifestyle changes, and suggest preventive
• Medical Research and Data Analysis: Researchers can leverage the predictive system to
analyze large-scale medical data, identify trends, and improve disease prediction models over
time.
5.6 Conclusion
This project demonstrates the power of data in transforming modern medicine. While algorithms
may never fully replace human expertise, they can significantly enhance it. Predictive analytics,
65
when used responsibly, can save lives, reduce hospital costs, and support proactive care
strategies. The project lays a strong foundation for future innovations in AI-driven healthcare.
The tools, techniques, and awareness developed here are applicable not just in academics but in
With rapid advances in AI, personalized medicine and digital health are no longer future
concepts — they are here. And with continued development, this system can become a stepping
66
67
6. Design
68
• The core system is called Predictive Healthcare Analytics System.
• System → Machine Learning: Sends Model Training & Prediction Output for training the
model.
• Health Inputs help the system make personalized predictions for patients.
• Model Training & Prediction Output is used to train machine learning algorithms.
69
70
71
6.2 Data Flow Diagram
Medical Data
Repository
Final Report Patient
Historical Data
Symptoms
Collect and
Preprocess Data
Predictive Health
Care System
Cleaned Data
Trained ML Model
Disease Prediction Report
Generate
Report Prediction Result Repository
72
Fig: 10 Data Flow Diagram External Entities (Green
Boxes):
73
o Predictive Health Care System o Uses historical data to produce disease
prediction reports.
• Generate Report:
o Based on the predicted disease, a final report is generated for the patient and
doctor.
Flow of Data:
• Cleaned Data ➔ used for Training Predictive Model and saving Patient Medical Records.
• Predicted Result ➔ stored in Prediction Result Repository before reaching the Doctor.
Final Deliverable:
• The Final Report reaches both the Patient and the Doctor for diagnosis and treatment
decisions.
74
75
6.3 Flowchart
Returns validation
result (valid)
Validation
Module
Display Component
Preprocessing Module
Result Formatter
ML Prediction Engine
76
Entities and Modules:
1. User:
2. Frontend Interface:
o The first point of interaction. o Sends input to the Validation Module to check for
errors. o If validation fails, shows an error message like "Please fill all required
fields".
3. Validation Module:
4. Preprocessing Module:
5. ML Prediction Engine:
6. Result Formatter:
7. Display Component:
77
o Takes the formatted results and displays them to the user along with
Flow of Data:
Error Handling:
• If user input is incomplete or invalid, the frontend shows an error without proceeding
further.
78
Fig:12 Home Page
This is the home page, where you will get the short info about our application.
The General Disease Prediction Page allows users to enter their symptoms and quickly receive a
list of the top 3 most probable diseases. By analyzing the input symptoms using a trained model,
the system generates predictions ranked by likelihood, providing probability scores for each
condition. This page is designed to be user-friendly, offering an intuitive interface for symptom
79
Fig:14 Heart Disease Prediction Page
The Heart Disease Prediction Page allows users to input their lab test results to predict the
likelihood of heart disease. By entering specific test values, such as cholesterol levels, blood
pressure, and other relevant lab data, the system analyzes the information and provides a
prediction based on medical research and algorithms. The results are displayed with a confidence
score, indicating the probability of heart disease, helping users understand their health status. The
page is designed to be simple and easy to navigate, ensuring users can quickly input their data
80
Fig:15 Heart Disease Prediction Output Page
This is the Heart Disease Prediction Result page, where users can view the prediction outcome
based on their provided health inputs. Along with the result indicating whether the user is at risk
or not, the page also highlights specific input fields that are outside the advisable range, helping
users understand which health metrics may need attention. This added insight supports better
81
Fig:16 Diabetes Prediction Page
This is the Diabetes Prediction Page, where users can input their health-related data such as
glucose level, BMI, age, and more to check their risk of having diabetes. After submission, the
82
Fig:17 Diabetes Prediction Output Page
This is the Diabetes Prediction Result Page, where users can view their prediction outcome—
whether they are diabetic, non-diabetic, or prediabetic. Along with the result, the page also
highlights specific input fields (such as glucose level, BMI) that fall outside the advisable range.
This additional feedback helps users understand which health metrics need attention, making the
83
6.5 Dataset and Tables
Dataset captures the essential relationship between symptoms and diseases in a structured,
detailed, and expandable way, making it perfect for building a general disease prediction system
• The data clearly maps each disease to its associated symptoms across multiple columns
84
•
This structure allows a prediction model to understand what symptom patterns are most
• Example: "Itching" + "Skin rash" + "Nodal skin eruptions" strongly point toward "Fungal
infection."
• Diseases in real life rarely present with just one symptom. The dataset captures up to 17
• This richness makes prediction more realistic because it can match multiple symptoms
• The dataset already contains different types of diseases (e.g., Fungal infection, Allergy,
GERD).
• This variety is important because it means the model can work across multiple kinds of
• Data is organized cleanly, with each disease consistently associated with a structured list
of symptoms.
• This is ideal for machine learning or rule-based algorithms, where consistency improves
the model’s ability to learn patterns without needing heavy data cleaning.
• Because many diseases may share some overlapping symptoms (e.g., fever, chills, body
85
•
You can predict top 3 most probable diseases, giving users more than one possible option
• This format is easy to expand: you can add new diseases and symptoms without changing
• Future symptoms or emerging diseases can be integrated easily. 6.5.2 Dataset for Model
This dataset is highly valuable because it includes key biological, clinical, and diagnostic
features that are known to influence heart disease, making it perfectly suited for creating
86
•
It Contains Clinically Proven Risk Factors
Each column captures a medically validated factor that influences heart disease risk:
o Age: Older age → Higher risk. o Sex: Males have a higher risk earlier; females
o Chest pain (cp): Certain chest pain types (e.g., typical angina) are strong
symptom.
o Fasting blood sugar (fbs): Diabetes (high sugar) is linked to heart issues.
• The dataset has all the important parameters doctors use to diagnose heart disease.
the ST segment:
o Very important because sometimes heart issues only appear during exertion.
• This dataset captures both resting and exertion-based features, increasing prediction
accuracy.
• Thalassemia (thal):
87
•
Visualized through fluoroscopy to check artery blockages.
• Real-world diagnostic tests are included here — not just self-reported symptoms.
age
Age of the patient (in years
sex
Sex (1 = male, 0 = female)
cp
Chest Pain Type (0 = typical angina, 1 = atypical angina, 2 =
trestbps
Resting blood pressure (in mm Hg)
chol
Serum cholesterol level (mg/dl)
fbs
Fasting blood sugar > 120 mg/dl (1 = true, 0 = false)
thalach
Resting electrocardiographic results (0 = normal, 1 = ST-T wave
88
•
slope The slope of the peak exercise ST segment (0 = upsloping, 1 = flat, 2
= downsloping)
89
thal Thalassemia (1 = normal; 2 = fixed defect; 3 = reversible defect)
This dataset contains all critical clinical, biochemical, and demographic indicators needed to
detect diabetes early, predict its onset, monitor risk factors, and even flag potential complications
— making it highly practical for real-world machine learning and healthcare applications.
90
o If HbA1c ≥ 6.5%, a person is usually classified as diabetic.
• This dataset includes HbA1c, so we can directly measure long-term blood sugar levels.
o High LDL ("bad cholesterol") o Low HDL ("good cholesterol") o High Triglycerides
(TG)
• This can cause cardiovascular complications, which are very common in diabetic patients.
• By analyzing these lipid levels, we can predict not just diabetes but also potential
complications.
• High Urea and high Creatinine (Cr) levels warn about kidney damage.
• Higher BMI means more body fat, which increases insulin resistance.
• In this dataset:
o A BMI > 25 or 30 could hint at obesity-related diabetes risk.
91
5. Demographic Information (Age and Gender)
• Age:
• Gender:
• Including age and gender helps create more personalized prediction models.
N → Normal (Non-diabetic)
D → Diabetic
P → Prediabetic
• This makes it a supervised learning problem — we have clear labels for training the
model.
AGE
Age of the patient (in years)
Urea Urea level in blood (mg/dL) — indicates kidney function
92
Cr Creatinine level in blood (mg/dL) — kidney health marker
HbA1c Glycated Hemoglobin (%) — key marker for average blood sugar level
over 3 months
Chol Total Cholesterol (mg/dL) — higher values can indicate risk of diabetes
complications
BMI Body Mass Index — measures obesity, a major risk factor for diabetes
N = Non-diabetic
D = Diabetic
P = Prediabetic
93
7. Conclusion & Future work
7.1 Overview
The evolution of healthcare analytics, driven by data science and machine learning, has
Healthcare Analytics Using Machine Learning” — was undertaken to explore the potential of
intelligent systems in diagnosing common yet serious health conditions like diabetes and heart
disease.
The journey began by identifying the limitations of conventional diagnostic procedures. These
include time-consuming lab-based tests, variability in expert opinions, high cost, and the need
for immediate action in critical scenarios. This paved the way for developing a machine
learningbased model capable of making preliminary predictions based on patient input data.
The project successfully implemented and evaluated several machine learning algorithms such
as:
• Logistic Regression
Through rigorous training and testing on open-source datasets, these models demonstrated
promising performance. Metrics such as accuracy, precision, recall, and F1-score were used to
assess the models, with Random Forest often leading in terms of balanced performance.
94
In addition to model development, the project emphasized data preprocessing, exploratory
analysis, and performance visualization — all vital for the robustness of predictive analytics.
This project is not intended to replace healthcare professionals but to assist them by acting as a
preliminary screening or diagnostic support tool. It promotes the idea of "precision health" —
One of the primary accomplishments of this project is the deep technical learning it facilitated.
The technical depth of this project ensured a full-cycle development experience, from raw data
Beyond the algorithms, this project enhanced our appreciation of healthcare challenges.
understanding medical terminologies, lab parameters (like glucose levels, BMI, cholesterol, etc.),
95
This blend of healthcare and data science is foundational to digital health innovations.
7.2.3 Practicality
• Developing a full pipeline — from raw data to predictions and UI — offered a practical,
7.3 Limitations
Although the project met its core objectives, some limitations exist:
• Static Datasets: The models were trained on historical data and lack real-time
adaptability. They are effective for proof-of-concept but require retraining and refinement
• Limited Scope: Only two diseases—heart disease and diabetes—were targeted. These are
common conditions with widely available data, but real-world systems must address a
• No Real-Time Data Input: In its current form, the system requires manual input of
• Interpretability: While model outputs are shown as predictions, deeper explanations (e.g.,
which features contributed most to a prediction) are not yet incorporated. For sensitive
applications like healthcare, explainability is essential to build trust with both patients and
clinicians.
• Data Imbalance and Quality: Some of the datasets used had class imbalance, requiring
sampling to balance the data. These solutions can introduce bias or affect precision and
96
• Ethical Oversight: Real-world healthcare systems require legal compliance, data privacy,
and medical validations, which go beyond the scope of this academic project.
• Clinical Decision Support: The system can serve as a decision support tool for physicians,
• Rural Healthcare: In areas with limited access to specialists, such systems can guide
• Mobile and Telemedicine Integration: Predictive tools can be integrated into telemedicine
platforms, enhancing virtual consultations with risk assessment and health tracking.
• Insurance and Risk Management: Insurance providers can use predictive analytics to
assess risk, personalize plans, and incentivize healthier behaviors through data-backed
wellness programs.
• Public Health Planning: Predictive systems can also be used to monitor disease
prevalence and anticipate regional health trends, supporting early interventions and
resource allocation.
• Liver disease
• Kidney disease
97
• Stroke risk
• IoT Devices: Connect the system with fitness bands or wearable devices to fetch real-time
parameters.
• Electronic Health Records (EHR): Use anonymized hospital data for continuous model
retraining.
• Doctors’ notes
• Medical histories
For adaptive systems that learn from user interaction and feedback over time, reinforcing better
predictions.
This project successfully demonstrates the transformative role of predictive analytics in modern
healthcare. It combines the precision of data science with the empathy of medical care. As AI
matures and becomes more accessible, intelligent healthcare systems can help detect diseases
98
The current work forms a strong foundation. By continuing to develop it with wider datasets,
realtime inputs, explainability modules, and integration capabilities, we can pave the way toward
Machine learning won’t replace doctors — but doctors using machine learning will have a
tremendous edge in improving patient lives. This future is not decades away; it's being built now.
99
REFERENCES
engineering.vol-7,issue-11
2. Pahulpreet Singh Kohli and Shriya Arora. (2018). “Application of Machine Learning in
3. Sajeev S, et al. (2019) Deep learning to improve heart disease risk prediction. In:
Machine learning and medical engineering for cardiovascular health and intravascular
4. Aditi Gavhane, Geetha S (2019) Prediction of heart disease using machine learning
heart disease dataset,” Distrib. Parallel Databases, vol. 2021, pp. 1– 20, Mar. 2021
6. Obeagu, Ezeanya, Ogenyi, and Ifu “Big data analytics and machine learning in
7. Jiang Ping Li, Amin Ul Haq, Salah Ud Din, Jalaluddin Khan, Asif Khan, and Abdus
E-Healthcare”
100
8. Rishi Reddy Kothinti, “Artificial intelligence in healthcare: revolutionizing precision
9. Shahid Mohammad Ganie and Majid Bashir Malik, “An ensemble Machine Learning
10. Daniele Rav`ı, Charence Wong, Fani Deligianni, Melissa Berthelot, Javier Andreu-Perez,
Benny Lo, and Guang-Zhong Yang , “Deep Learning for Health Informatics”
11. Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang and Joel T. Dudley, “Deep
12. Md. Monirul Islam, Shahriar Hassan, Sharmin Akter, Ferdaus Anam Jibon, Md.
13. Min Chen, Yixue Haoi, Kai Hwang2, Lu Wangi, and Lin Wang, “Disease Prediction by
14. Stephen S. Johnston, John M. Morton, Iftekhar Kalsekar, Eric M. Ammann, Chia-Wen
Hsiao, Jenna Reps, “Using Machine Learning Applied to Real-World Healthcare Data for
15. Mohammed Badawy, Nagy Ramadan1 and Hesham Ahmed Hefny, “Healthcare predictive
• https://www.kaggle.com/datasets
• https://www.who.int/data/collections
101
ANJALI SINHA
Ó +91-9934868033 [ [email protected] Lucknow - India ¯ Linkedin github Leetcode
• Cloud Computing: Cloud Architecture,
EXPERIENCE Virtualization, Migration to Cloud,
Governance in Cloud(IaaS, PaaS, SaaS)
• Artificial Intelligence & Machine Learning:
Data Analyst Intern Machine Learning Models, Natural Language
PrepInsta Processing, Bayesian Learning, Expert
June 2024 – August 2024 Remote Systems
Personal Portfolio
• Programming Languages: JAVA, Python, SQL • Angular | HTML | CSS
• Databases: MongoDB, MySQL • Developed a fully responsive personal
portfolio website to showcase my projects,
• Web Technologies: Angular, Spring Boot, HTML, CSS, skills, and experience. Github
Bootstrap
TaskManager
• HTML | CSS | JavaScript | Spring Boot | MySQL
• A responsive task manager developed using HTML, CSS,
JavaScript, Spring Boot and MySQL to perform the basic
CURD operations. Github
TrouristGuide
• Angular | HTML5 | CSS | Bootstrap | Node.js
• A responsive tourist guide website developed using various
frontend and backend technologies. Github
Ajay Kumar
[email protected] |+91 9580487404 | LinkedIn: Ajay Kumar| GitHub: ajayVerma333
EDUCATION
Integral University Lucknow, Uttar Pradesh Bachelor of Computer Application
Expected Graduation, July 2025 CGPA: 9.0
PROFILE SUMMARY
Targeting Full stack Developer roles with an organization of high repute with a scope of improving knowledge and
further career growth.
● MERN stack (MongoDB, Express.js, React.js, Node.js) with a strong background in building scalable web
applications.
● Demonstrated ability to manage and deliver projects efficiently, from initial planning and design to final
deployment and maintenance.
● Committed to staying updated with the latest industry trends and technologies, continuously enhancing
skills to provide innovative solutions.
PROJECTS
College Website June 2024- Current
Tools: HTML, CSS, TAILWIND, JAVASCIPT
A full-featured blog application enabling users to register, authenticate, and manage their blog posts seamlessly.
Responsibilities:
● Designed and developed a visually appealing and responsive website layout using HTML, CSS, and Tailwind
CSS to enhance the user experience across devices.
● Implemented dynamic features and interactive elements using JavaScript to improve engagement and
functionality.
ACADEMIC ACHEIVEMENTS
● Consistently maintained a CGPA of 9.0 (out of 10) throughout the semesters, showcasing strong academic
performance and dedication.
● Actively participated in organizing a departmental event led by the Computer Science department,
contributing to event planning, coordination, and successful execution.
SKILLS
Programming: C++ | HTML5 | CSS3 | Tailwind | SQL | JavaScript | React JS | Express JS | Node JS | MongoDB
Tools: GIT, MySQL, VS Code, GitHub, MS Office, MS PowerPoint
CONTACT
PROFILE SUMMARY
+91 6352958701 front-end and back-end technologies, including React.js,
JavaScript, and databases like MySQL and MongoDB. Eager to contribute to
[email protected] Lucknow,
innovative projects and grow my expertise in full-stack development.
Uttar Pradesh
https://amarchaurasiya.netlify.app/
PROJECTS
EDUCATION
2022 - Present BLINKIT (CLONE) DEC. 2024
INTEGRAL UNIVERSITY, LUCKNOW
Created a responsive web application clone with React and Tailwind
Bachelor of Computer
CSS, emphasizing UI/UX design and functionality. Leveraged React
Applications (BCA) features like JSX, functional components, and hooks (useState,
Current CGPA : 8.8 useEffect) for dynamic interaction, while Tailwind CSS provided a clean,
2019 - 2021 efficient, and customizable styling solution.
[H/S DEWAPUR, GOPALGANJ,BR]
Senior Secondary (Class 12) FITNESS TRACKING WEB APPLICATION
Percentage : 71.8%
A MERN stack-based web app where users can register, log in, and track
their fitness activities (like exercises, workouts, steps, calories, etc.). Users
SKILLS can create, update, and delete workouts, view their exercise history, and
monitor their fitness goals. The app uses MongoDB for storing user and
HTML CSS workout data, Express.js and Node.js for the backend API, React.js for the
frontend interface, and optionally Tailwind CSS for fast and modern
styling.
JAVASCRIPT C++
REACT.JS TAILWIND
MONGODB MYSQL
PYTHON EXPRESS JS
LANGUAGES
English: Good
Hindi: Fluent
Aspiring web developer with
strong foundational skills in
ABHISHEK Lucknow, India 226023
+91 9118443467
TIWARI [email protected]
BCA graduate with a strong background in software development and problem-solving. Seeking to
leverage expertise in programming languages and development methodologies to contribute to
innovative projects and enhance system performance. Committed to continuous learning and
collaboration within a dynamic team environment.
EDUCATION
BACHELORS OF COMPUTER APPLICATIONS | INTEGRAL UNIVERSITY LUCKNOW
Sept 2022 – July 2025
7.20 CGPA
INTERMEDIATE | ST. ANGELOES COLLEGE
2022
77.80%
HIGH SCHOOL | ST. ANGELOES COLLEGE
2020
74.75%
TECHNICAL SKILLS
Programming Languages: Python, HTML, CSS and C++.
Databases: MySQL and MongoDB.
Software Tools & Platforms: VS Code and Google Colab.
Operating System: Windows.
Soft Skills: Communication, teamwork, problem solving and project management.