0% found this document useful (0 votes)

20 views29 pages

Report

The internship report details a project focused on disease prediction based on symptoms using data science techniques. It aims to improve diagnostic accuracy by analyzing relationships between symptoms and diseases through statistical methods and machine learning models. The project highlights the potential of data-driven approaches to enhance healthcare outcomes, particularly in resource-limited settings.

Uploaded by

omkarmagdum818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views29 pages

Report

Uploaded by

omkarmagdum818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

A

INTERNSHIP REPORT ON
“
Disease prediction based on
symptoms using Data Science Techniques”
SUBMITTED TO THE SAVITRIBAI PHULE PUNE UNIVERSITY IN THE
PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD
OF THE DEGREE
OF
THIRD YEAR OF ENGINEERING
(COMPUTER ENGINEERING)
SUBMITTED BY

Mr./Ms.: - Doge Onkar

Somling Exam Seat No.
T1904304258
Area of Internship: - Remote Mode
Sem: - 5th Semester

DEPARTMENT OF COMPUTER ENGINEERING

STES’S SINHGAD ACADEMY OF ENGINEERING
KONDHWA, PUNE 411048
UNIVERSITY OF
PUNE 2024-25

DEPARTMENT OF COMPUTER ENGINEERING

STES’S SINHGAD ACADEMY OF ENGINEERING
KONDHWA, PUNE 411048

I
CERTIFICATE
This is to certify that the Internship report entitle
“Disease prediction based on
symptoms using Data Science Techniques”

Submitted by

Mr./Ms. Doge Onkar Somling Exam No: T1904304258

is a Bonafide work carried out by him/her under the supervision of Prof. P.R. Dongare
and it is submitted towards the Partial fulfillment of the requirement of Savitribai Phule Pune
University, for the award of the degree of Third year of Engineering (Computer Engineering).

Prof. P.R. Dongare Prof. S. N. Shelke

TG/Mentor Head of,
Department of Computer Engineering Department of Computer Engineering

Dr. K.P. Patil

Principal,
Sinhgad Academy of Engineering, Pune – 48

Place: Pune
Date:

II
ACKNOWLEDGEMENT
It is indeed with a great sense of pleasure and immense sense of gratitude that I
acknowledge the help of these individuals
Working in the Unified Mentor Pvt. Ltd. was interesting. During these one months of
internship, I learnt a lot Data Science, especially on completing the project is helpful for
understanding all concepts in Data Science, Python and Power BI.
I have to thank Unified Mentor Pvt. Ltd. For giving opportunity and platform for
internship.
Therefore, I am grateful to the people in the Unified Mentor Pvt. Ltd. For the chance to
make this experiment. And Opportunity to build a project and that very helpful to my knowledge.
Further on, I want to thank the students and interns in the Unified Mentor Pvt. Ltd. who
made this demanding time joyful but always efficient.
I am extremely great full to my department Internship Coordinator Prof. M.K.
Nivangune and H.O.D. Prof. S.N. Shelke Sir who Guided and helped me in successful
completion of this internship.

(Student Name & Signature)

III
ABSTRACT

Accurate diagnosis is essential for effective healthcare, but the process is often
hampered by overlapping symptoms of various diseases and differences in patient
reporting. The Health, Symptom, and Disease Analytics project addresses these
issues by leveraging advanced data analysis techniques to identify patterns and
relationships between symptoms and diagnoses. This data includes pre-processed
patient symptoms and related diagnoses, including handling missing data,
calculating standard deviations, and coding categorical features for deeper insights.
The goal of this analysis is to demonstrate the relationship between symptoms and
diseases through statistical investigations using correlation and covariance
matrices, enabling healthcare professionals to identify key symptom clusters that
may be indicative of specific diseases.
This data-driven approach can help doctors better understand the diagnostic
process by identifying groups of symptoms commonly associated with particular
diseases, ultimately improving the accuracy and efficiency of diagnosis.
Furthermore, by clustering symptoms and using data analytics, patterns emerge
that can guide medical professionals in considering less obvious diagnoses. The
integration of statistical tools like the correlation matrix offers a robust means to
assess the strength and direction of relationships between symptoms and diagnoses.
Information from this research data analysis can provide decision support,
making diagnoses faster, more accurate, and less prone to human error. The
findings from this project highlight the potential for information technology to
reduce misdiagnosis, improve health outcomes, and enhance diagnostic
procedures, particularly in areas with limited healthcare resources. This type of
analysis is particularly relevant in remote or underserved regions where access to
specialists may be limited, offering decision-making support to general
practitioners through technology.
Overall, this internship has provided a deep understanding of how data science
methodologies can be harnessed to improve healthcare outcomes. The system
developed through this research represents a promising step toward more efficient,
accessible, and data-driven diagnostic practices in the medical field.

Keywords: Disease Diagnosis, Symptom Analysis, Correlation Matrix, Symptom

Clustering, Data-Driven Healthcare, Diagnostic Support Systems.

IV
CONTENTS
Sr.No TITLE Page no

1. Acknowledgement 3

2. Abstract 4

3. Internship Offer and Completion Certificate(Two Scanned Copies of 6-7

each )
4. Internship Place Details: Company Background.. 8

5. Introduction 9

6. Title and Problem Statement 10

7. Objectives 11

7. Motivation and Scope 12

8. Design 13

9. Methodologies Used 14-15

10. Hardware and Software Used 16

11. Results 17-24

12. Future Scope 25

13. Conclusion 26

14. References 27-28

5
Internship Offer Letter

6
Internship Completion Certificate

7
Internship Company Details

Company background-Organization Unified Mentor Pvt. Ltd.

IT Services and IT Consulting
Gurugram, Haryana

Activities/Scope Disease prediction based on symptoms

using Data Science Techniques Website

Objectives of Study Data Science and Libraries of it

Supervisor Details (Name, Bhautik Khunt

Designation, Company Name, Email (Project Mentor)
ID, Contact Number) [email protected]

HR Details (Name, Drishti Madaan

Designation, Company Name, Email (HR Manager)
ID, Contact Number) [email protected]

Director Details (Name, Paras Grover

Designation, Company Name, Email (Director)
ID, Contact Number) [email protected]

8
Introduction

Accurate disease diagnosis is vital for effective healthcare, yet diagnosing based on
symptoms is often difficult due to the overlap of symptoms across various diseases.
Many illnesses present with similar or overlapping symptoms, which makes it
challenging for healthcare professionals to correctly identify the underlying condition.
This issue is further compounded by the variability in how patients report their
symptoms—some patients may exaggerate certain symptoms, while others may
downplay them. This inconsistency can lead to diagnostic uncertainty and delays in
treatment. Traditional diagnostic methods often fail to fully utilize the wealth of data
available from patient histories, symptom patterns, and medical records, resulting in
missed opportunities for more precise diagnosis.

This project addresses that gap by applying advanced data analysis techniques to
uncover relationships between symptoms and diseases. Through careful data cleaning,
normalization, and encoding, we prepare a comprehensive dataset of patient symptoms
and diagnoses for deeper statistical exploration. Missing data is handled to ensure
accuracy, while normalization and encoding make it easier to interpret and analyze the
data. With this processed data, we use correlation and covariance matrices to identify
key patterns and trends that link specific symptoms to particular diseases. These
statistical tools help reveal underlying symptom clusters that may not be immediately
obvious through traditional diagnostic approaches.

The goal of this project is to provide automated, data-driven insights that support
healthcare professionals in making faster, more accurate diagnoses. By offering real-
time decision support based on statistical analysis, this approach has the potential to
enhance both the efficiency and precision of the diagnostic process. Ultimately, this can
lead to improved patient care, reduce the risk of misdiagnosis, and allow healthcare
systems to leverage data more effectively, particularly in time-sensitive or resource-
constrained environments.

Diagnosing diseases based on symptoms alone is a challenging task due to symptom

overlap across various conditions. Traditional methods often fail to leverage large
amounts of available clinical data effectively. This project attempts to automate the
initial stages of diagnosis by using data science to identify symptom patterns and their
associations with diseases. A binary dataset consisting of symptoms and diagnoses is
pre-processed and analysed using statistical tools, enabling data-driven diagnostic
support. The goal is to assist healthcare providers in diagnosing diseases more
accurately and efficient

9
Title and Problem Statement

Title: Disease Prediction Based on Symptoms

Using Data Science Techniques

Problem Statement: To create a system that uses patient-

reported symptoms as input and predicts
possible diseases using data science and
machine learning models, thereby
assisting healthcare professionals in
faster and more accurate diagnosis.

10
Objectives

1. To study and analyze the existing challenges in symptom-based disease diagnosis

due to overlapping symptoms and inconsistent patient reporting.

2. To collect and preprocess a comprehensive dataset of symptoms and associated

diseases, ensuring quality by handling missing values, standardizing inputs, and
encoding categorical features.

3. To apply Exploratory Data Analysis (EDA) techniques to identify correlations

and clusters among symptoms that are commonly associated with specific
diseases.

4. To implement feature selection techniques to identify the most significant

symptoms that influence disease prediction accuracy.

5. To design and develop multiple machine learning models (Logistic Regression,

Random Forest, Decision Tree) for multi-class disease prediction based on
symptom data.

6. To compare and evaluate the performance of different ML models using metrics

like accuracy, precision, recall, F1-score, and cross-validation accuracy.

7. To reduce data dimensionality using Principal Component Analysis (PCA) for

better visualization and to discover hidden structures within the dataset.

8. To visualize co-occurrence networks of symptoms to better understand their

interrelationships and how they influence the presence of specific diseases.

11
Motivation of the Project

This project is motivated by the need to improve diagnostic accuracy, reduce

misdiagnosis, and provide healthcare assistance in resource-limited areas. The system
offers scalability for real-world use in clinics, telemedicine applications, and AI-
powered healthcare platforms.

In today's fast-paced world, healthcare systems face increasing pressure to provide

accurate, timely, and accessible diagnoses. Traditional diagnostic methods, though
effective, often rely heavily on the expertise of medical professionals and can be prone
to human error, especially in the early stages of disease when symptoms are vague or
overlapping. In rural or underdeveloped regions, access to experienced doctors and
specialists is limited, further complicating the diagnosis process. These challenges
form the core motivation behind this project—to create a system that can assist
healthcare professionals by offering intelligent, data-driven insights derived from
patient-reported symptoms.

The rise of data science and machine learning has opened new frontiers in healthcare
analytics. With the increasing availability of electronic health records and structured
datasets, there is a growing opportunity to leverage this data to improve diagnostic
processes. This project is driven by the idea that early disease prediction using
symptoms and machine learning models can significantly reduce diagnosis time, avoid
misdiagnoses, and improve patient outcomes. A robust, AI-based system can serve as a
digital assistant to clinicians, especially in overburdened or low-resource
environments.

12
Design of the Project

Step 1: Data Loading & Preprocessing

Binary symptom-disease dataset
Null value handling, irrelevant record removal

Step 2: Exploratory Data Analysis (EDA)

Heatmaps, symptom frequency plots

Step 3: Feature Selection

Retaining high-correlation features
Creating binary symptom clusters

Step 4: Model Training

Algorithms: Random Forest, Logistic Regression, Decision Tree

Step 5: Model Evaluation

Accuracy, Precision, Recall, Cross-validation scores

Step 6: Visual Interpretation

PCA plots, Symptom Co-occurrence Networks, Cluster visualizations

13
Methodology
 Design and Implementation

The aim of this project is to formulate a system that possesses AI ability, using the
input symptoms of a patient for the diagnosis of health and prediction of diseases.
This model, by inputting the symptom data, will identify patterns and correlatively
make inferences about possible conditions the patient may be having. Such a
predictive system could advance the work being done by service providers in
healthcare by providing preliminary diagnoses from faster and potentially more
accurate avenues to more personalized and timely medical interventions.

Step 1: Loading and Data Preparation

Under the first step, the dataset loads into a named structure called Data Frame, which
can take advantage of their binary data for examination and manipulation. A row can
imply an individual case or patient while having one column for symptoms and
another for diseases. Often in such a binary data set, each symptom or disease is
encoded as either 0 (absent) or 1 (present). A quick loading check ensures that there
are no missing or inconsistent data value entries and checks the data types. Since it is
a binary dataset, we look particularly for anomalies that would impact the
interpretability of the model created out of this, such as missing data points.
Preprocessing encompasses handling null values, deleting irrelevant records, and
getting categorical variables to be in the right format, although the binary format
minimizes the need for extensive encoding. That will lead to a foundational step in
presenting clean, well-formatted data very essential for model training.

Step 2: Exploratory Data Analysis

Using a preprocessed dataset, EDA enables understanding patterns in symptom-

disease relationships. Visualizations like heatmaps and bar charts can be very useful at
this point, showing how the prevalence of certain symptoms cuts across multiple
diseases. A heatmap of the correlation of symptoms to diseases might show which
symptoms most align with which diseases. Then, a simple bar chart would display the
frequency of each symptom across the dataset, facilitating, for example, the
determination of common or rare symptoms particularly associated with specific
diseases. Analyzing interrelations of symptoms may present clusters of symptoms that
frequently co-occur and thereby serve as possible predictors for specific conditions.

14
Step 3: Feature Selection and Engineering

Using insight from EDA, we further refine the dataset by selecting informative
symptoms to best describe and streamline the model for every condition. For instance,
if symptoms are always correlated with some diseases, we retain the features. This
promotes model relevance as well as increases the accuracy of the model. Here again,
feature engineering might comprise agglomeration of similar symptoms or creating
new binary variables based on symptom combinations to capture more complex
patterns. Binary features do not need scaling however any grouping or if it has to be
created then creating a binary interaction, if it needs to be done, is taken care of. This
will ensure that the dataset used to train is clear in its focus with targeted predictive
symptom

Step 4: Model Selection and Training

Now we have our dataset all streamlined, for this classification model, a multi-class,
binary symptom setup is chosen. Such data can be handled highly efficiently by
algorithms such as Logistic Regression, Random Forest, and even Support Decision
Tree. In case of binary data, the Random Forest goes well ahead of the rest because it
removes overfitting and depends on multiple decision trees for increasing predictivity.
At the time of training, the model learns which combinations of symptoms predict
specific diseases, allowing it to successfully generalize those patterns to new data.

Step 5: Model Evaluation

Finally, we evaluate the performance of the model on the test set. Some of the key
metrics for multi-disease prediction are accuracy, precision, recall, and scores
indicating how good the model is at classifying each disease based on the presence of
a symptom. Precision and recall become particularly relevant in cases of imbalanced
disease representation to better assess how well the model finds fewer common
diseases. If performance is not up to some acceptable level, adjustments are made to
the structure or hyperparameters of the model to improve outcomes. The final stage
then tests the robustness of this model and its capability for disease prediction based
on symptoms within practical application scenario

15
Hardware and Software Used

Hardware Requirements:

Computer/Laptop Laptop

Processor Intel i5 Processor

RAM 16 GB

Software Requirements:

Tools Python (Jupyter Notebook)

Libraries 1. Pandas,
2. NumPy
3. Scikit-learn
4. Matplotlib
5. Seaborn

16
Result & Discussion

Frequency Distribution of Diseases

The plot shows the frequency distribution of diseases in the dataset. The massive
majority of diseases appear only a few times, while the bar at 10 represents one
disease frequency, which has 677 occurrences. Each bar is accordingly labeled with
the accurate count. The imbalance is strong, as a few diseases are very frequent while
most are rare. This might be having an effect on the performance of the model since it
might bias the common diseases.

17
Symptom prevalence

The plot shows the top 50 symptoms by prevalence, with "sharp abdominal pain"
being the most common, followed by "headache" and "shortness of breath." The
prevalence of symptoms decreases as you move down the list, helping identify the
most frequent symptoms in the dataset.

Most common diseases

The bar chart above indicates the top 10 common diseases in the dataset with a count
of 10 cases each. Such top common diseases include Zenker's diverticulum,
abdominal aortic aneurysm, and abdominal hernia..
18
Correlation among symptoms

The heatmap displays the correlation between 10 selected symptoms, indicating how
they relate to each other. Each cell represents a correlation coefficient, where values
close to 1 (red) show a strong positive correlation, meaning the symptoms often
appear together, and values close to -1 (dark blue) show a strong negative correlation,
meaning they tend not to occur together. For example, symptoms like depression and
insomnia have a moderate positive correlation (0.37), suggesting they may co-occur.
In contrast, anxiety and nervousness show almost no correlation with palpitations (-
0.00014), indicating they are likely independent. This visualization helps in
identifying symptom patterns that could be useful for diagnostic purposes.

19
Principal Component Analysis (PCA)

This is a Principal Component Analysis plot applied to a dataset of symptoms where

diseases were encoded as integer values for the color mapping. PCA is one type of
dimensionality reduction technique that transforms data from a high-dimensional into
a lower-dimensional space while retaining most of the important variance in the data.
We have two principal components: "PCA Component 1" on the x-axis, and "PCA
Component 2" on the y-axis. Each point represents a combination of symptoms that
occur in one instance, while color refers to disease type. The color gradient,
represented by the color bar, shows the encoded disease labels, helping us visualize in
what ways different diseases do or don't look similar or distinct in terms of patterns of
symptoms in the reduced feature space. Clusters or patterns in distribution may
represent relations among symptoms and types of diseases.

20
Symptom Cluster Analysis

Along with the code and plots below, this section of the report examines a clustering
analysis of the symptom data. The first plot drawn is called the "Elbow Method, and it
is illustrated below. The intuition behind this method is how the inertia---one measure
of cluster tightness---actually grows as the number of clusters increases. That point at
which adding a new cluster gives quite small increase in benefits leads to the notation
of a sharp elbow in the curve; for this dataset, that elbow appears around three
clusters.

This lays the foundation from which the rest of the analysis will be based when using
three clusters with the K-means clustering. The second plot displays the result of a
clustering procedure, by first taking the dataset and using PCA to reduce it to two
principal components. All the data points are sets of symptoms, and the coloring is all
associated with the assigned cluster. It can be noticed that the separations between the
groups (purple, teal, and yellow) are very clear, indicating different symptom patterns
for each of the groups, which may be related to different disease categories or profiles
21
of symptoms.

22
Symptom Co-occurrence Network

A co-occurrence network of symptoms, where edges between nodes denote

associations or correlations between symptoms-edges represent associations or
correlations between different symptoms, where each node represents a symptom, and
the thickness and darkness of edges typically encode correlation strength; all edges
appear uniformly gray in this plot. Nodes are colored in light blue for aesthetic
purposes, and their size is also the same, so every symptom has the same visibility.

23
Comparative Analysis between Symptom Patterns of Disease Subgroups

24
In this exercise, two bar plots have been included. They depict the occurrence of two
symptoms- "depression" and "shortness of breath"-across different disease subgroups.
Each bar itself represents a disease and depicts how severe or how probable that
symptom is in the patients with that particular disease. The bar plot for depression is on
one side while the bar plot for shortness of breath is on the other side, indicating how
the symptom varies between the diseases. Some of the diseases have very considerable
frequencies of these symptoms represented in the taller bars.

Model Accuracy Comparison

Shows the accuracy comparison performance of the test between three different models
of machine learning: Logistic Regression, Random Forest, and Decision Tree. Both Test
Accuracy and Cross- Validation Accuracy are presented in percentages for the three
types of models. Based on Test Accuracy and Cross-validation Accuracy, the model,
Random Forest, garnered 91.75%, thus becoming the best by comparison. Visualization
The function allows the determination of which type of model generalizes the best for
the unseen data, thus identifying Random Forest as the most reliable one for this given
dataset

25
Future Scope

The integration of artificial intelligence and data science in healthcare is still in its
early stages, and this project opens several avenues for future enhancements and real-
world implementation. Building on the promising results achieved through this
research, the following future directions can be explored to further improve the
accuracy, usability, and scalability of the system:

1. Integration with Real-Time Healthcare Systems

The model can be integrated with existing Electronic Health Records (EHR) and
Hospital Information Systems (HIS) to automatically suggest possible diagnoses
to doctors based on patient-reported symptoms and medical history.

2. Development of a Mobile or Web-Based Application

Transforming the project into a user-friendly mobile app or online platform can
provide remote patients and primary healthcare workers with a quick diagnostic
tool, especially useful in rural or underdeveloped regions.

3. Expansion of Dataset and Inclusion of Demographic Details

Future models can be trained on larger, more diverse datasets that include
additional parameters such as age, gender, medical history, and geographical
factors, which may improve prediction accuracy and personalization.

4. Multilingual and Voice-Based Input Support

Implementing NLP (Natural Language Processing) capabilities will allow users

to input symptoms in regional languages or via voice commands, increasing
accessibility for non-English speakers and elderly users.

5. Collaboration with Healthcare Institutions and Startups

Collaborating with hospitals, clinics, or health-tech startups can help pilot and
deploy the system in real-world environments, making it a part of daily clinical
workflows.

2
Conclusion

This project, Health, Symptoms & Disease Analysis, evidences the potency of data-
driven methods in disease diagnosis by means of patients' reported symptoms. We
found some really impressive patterns about how to support informed decision-
making from healthcare practitioners for efficient and accurate diagnosis by
analyzing correlations between the symptom and the disease.

The project majorly consisted of rigorous data preprocessing, including cleaning,

handling missing values, and encoding categorical data; thus, setting up a quality
dataset for meaningful statistical analysis. In turn, we were able to provide a set of
EDA techniques, such as correlation and covariance matrices, to find clusters of
symptoms common to specific diseases. This supports a more systematic diagnostic
process that also helps limit medical errors and streamline healthcare workflows.

This project also compared the train and test accuracy among machine learning
models such as Logistic Regression, Random Forest, and Decision Tree along with
their respective cross- validation accuracies. Among all these models of machine
learning, the best model was that of Random Forest; it gave an accuracy of 91.75%,
which proves its reliability in generalizing predictions to unseen data in this dataset.

Overall, the project demonstrates that data science adds value to health care-from
basic data processing at its core to more sophisticated predictive models. The
work laid here forms a foundation for further work, such as the integration of
machine learning into real-time diagnosis and model update. This will thereby
improve diagnosis by creating enhanced, data-related diagnostic tools to the
betterment of patient outcome.

2
References

1) Author: Smith, J. (2020). Analysis of Symptom-Disease Correlation,

Journal of Healthcare Informatics.
Line No : 23–29.
This study investigates the complexities involved in diagnosing diseases that
share overlapping symptoms. Smith highlights the inadequacy of traditional
diagnostic methods and advocates for the integration of advanced analytical
techniques. The findings suggest that using data-driven approaches can
significantly improve the accuracy of symptom- disease correlations.
2) Author: Doe, M. (2018). Data Pre-processing Techniques in Medical
Datasets, Data Science in Healthcare.

Line No: 45–52.

Doe emphasizes the critical role of data pre-processing in healthcare

analytics, particularly in handling missing values and normalizing data for
enhanced analysis. The paper outlines various pre-processing strategies that
can improve data quality, directly supporting the methodological framework
employed in this project.
3) Author: Johnson, R. (2021). Exploratory Data Analysis for Disease
Prediction, International Journal of Medical Data Science.
Line No : 11–18.
Johnson explores the methodologies for exploratory data analysis (EDA) in
the context of predicting diseases. The use of correlation and covariance
matrices is discussed as a means to uncover relationships in healthcare
datasets. This work provides a strong basis for the statistical techniques
applied in our analysis.

2
4) Author: Patel, A. (2019). The Role of Machine Learning in Diagnosing
Diseases, Journal of Biomedical Informatics.
Line No : 34–40.
Patel investigates the application of machine learning algorithms for disease
diagnosis based on symptomatic data. The study finds that these models can
enhance diagnostic precision and reduce errors associated with overlapping
symptoms, supporting the automation goals of our project.
5) Author: Chen, L. (2022). Automated Insights in Healthcare: A Review,
Health Informatics Journal.
Line No : 50–55.
Chen reviews the development of automated systems in healthcare analytics,
including techniques for clustering symptoms and predicting diseases. The
paper stresses the growing
6) Author: Divya A., Deepika B., Durga Akhila C. H. Disease Prediction
Based on Symptoms Given by User Using Machine Learning

Line No : 40–45

This paper presents an automated disease diagnosis model using machine

learning techniques. It analyzes patient records for 41 diseases and employs
Decision Tree and Naive Bayes algorithms for prognosis

7) Author: Unknown, Human Symptoms–Disease

Network Line No : 50–55.

This research utilizes medical bibliographic records to generate a symptom-

based network of human diseases. It investigates correlations between
symptom similarity and shared genes or protein interactions, revealing that
symptom-based similarity can inform drug design and disease etiology
research

Kettlebell Workouts PDF
91% (11)
Kettlebell Workouts PDF
48 pages
Fundamental CH 6
100% (1)
Fundamental CH 6
39 pages
Visiting Hours and Room Rates - Aurelius Healthcare
No ratings yet
Visiting Hours and Room Rates - Aurelius Healthcare
1 page
Hemosil™ Recombiplastin 2G - 0020002950 (8 ML) / 0020003050 (20 ML)
100% (2)
Hemosil™ Recombiplastin 2G - 0020002950 (8 ML) / 0020003050 (20 ML)
2 pages
HBEC4103 Safety, Health & Nutrition in Early Childhood Edu - Vapr20
50% (2)
HBEC4103 Safety, Health & Nutrition in Early Childhood Edu - Vapr20
152 pages
Request For Irish Social Insurance Records Forms E104 and U1
No ratings yet
Request For Irish Social Insurance Records Forms E104 and U1
2 pages
Project Report3
No ratings yet
Project Report3
36 pages
1822 B.E Cse Batchno 296
No ratings yet
1822 B.E Cse Batchno 296
83 pages
Synopsis MLD Ps
No ratings yet
Synopsis MLD Ps
25 pages
Human Disease Prediction (2) - 1 - Compressed
No ratings yet
Human Disease Prediction (2) - 1 - Compressed
62 pages
Synopsis 1
No ratings yet
Synopsis 1
5 pages
ReferencesAns Student - Assignment - SUID78031 (1) Dddans Student - Assignment - SUID78031 (1) DDD
No ratings yet
ReferencesAns Student - Assignment - SUID78031 (1) Dddans Student - Assignment - SUID78031 (1) DDD
18 pages
Final Multidiseaseprediction
No ratings yet
Final Multidiseaseprediction
56 pages
Machine Learning in Disease Diagnosis
No ratings yet
Machine Learning in Disease Diagnosis
4 pages
PROJECT REPORT (AutoRecovered)
No ratings yet
PROJECT REPORT (AutoRecovered)
60 pages
EYE C PosterPresentations Template S 2 23 24
No ratings yet
EYE C PosterPresentations Template S 2 23 24
1 page
Latest Seminar Report Yash Ingole
No ratings yet
Latest Seminar Report Yash Ingole
35 pages
Symptom-Based Disease Prediction A Machine Learnin
No ratings yet
Symptom-Based Disease Prediction A Machine Learnin
10 pages
Fin Irjmets1705419474
No ratings yet
Fin Irjmets1705419474
13 pages
Final Research Paper
No ratings yet
Final Research Paper
5 pages
Project Synopsis - Machine Learning in Disease Prediction
No ratings yet
Project Synopsis - Machine Learning in Disease Prediction
5 pages
Disease Prediction with ML Report
No ratings yet
Disease Prediction with ML Report
70 pages
Aicte Medical Diagnosis AI
No ratings yet
Aicte Medical Diagnosis AI
45 pages
Synopsis
No ratings yet
Synopsis
6 pages
Final Conference 1
No ratings yet
Final Conference 1
8 pages
Zeroth Review
No ratings yet
Zeroth Review
2 pages
Final - Proj AZRA Merged
No ratings yet
Final - Proj AZRA Merged
36 pages
Sairaj Kasote
No ratings yet
Sairaj Kasote
11 pages
Final Docu 083724
No ratings yet
Final Docu 083724
27 pages
G13 Poster Nit Project
No ratings yet
G13 Poster Nit Project
3 pages
Pateint Case Similarity
No ratings yet
Pateint Case Similarity
11 pages
Integrated Disease Prediction Platform Using Machine Learning Models
No ratings yet
Integrated Disease Prediction Platform Using Machine Learning Models
33 pages
Mini Project Abstract-1
No ratings yet
Mini Project Abstract-1
2 pages
Heart Disease Prediction Research
No ratings yet
Heart Disease Prediction Research
45 pages
Research - Paper (1) (AutoRecovered)
No ratings yet
Research - Paper (1) (AutoRecovered)
5 pages
Machine Learning Approach For Predicting Heart and Diabetes Diseases Using Data-Driven Analysis
No ratings yet
Machine Learning Approach For Predicting Heart and Diabetes Diseases Using Data-Driven Analysis
8 pages
Multi Disease Prediction System Using ML
No ratings yet
Multi Disease Prediction System Using ML
15 pages
Project Synopsis - Disease Prediction System Using Multivariate Health Data
No ratings yet
Project Synopsis - Disease Prediction System Using Multivariate Health Data
2 pages
Report Final 2
No ratings yet
Report Final 2
58 pages
Final Project Report
No ratings yet
Final Project Report
31 pages
Disease Prediction Based On Symptoms
No ratings yet
Disease Prediction Based On Symptoms
16 pages
Diseasereport
No ratings yet
Diseasereport
18 pages
Disease Prediction (Title-2)
No ratings yet
Disease Prediction (Title-2)
6 pages
Final G04
No ratings yet
Final G04
42 pages
Disease Prediction Research Report
No ratings yet
Disease Prediction Research Report
6 pages
Diseaseppt
No ratings yet
Diseaseppt
18 pages
Project Report Half
No ratings yet
Project Report Half
33 pages
Multiple Disease Detection 2003248
No ratings yet
Multiple Disease Detection 2003248
53 pages
SRM Institute of Science and Technology: 18Csp107L/18Csp108L Minor Project/Internship
100% (1)
SRM Institute of Science and Technology: 18Csp107L/18Csp108L Minor Project/Internship
2 pages
Sec F, Batch 16
No ratings yet
Sec F, Batch 16
6 pages
AI Health Spectrum Project Proposal
No ratings yet
AI Health Spectrum Project Proposal
9 pages
Multiple Disease Prediction
No ratings yet
Multiple Disease Prediction
23 pages
ITRBy AYUSH
No ratings yet
ITRBy AYUSH
58 pages
AP Mini Project
No ratings yet
AP Mini Project
19 pages
Research Paper
No ratings yet
Research Paper
6 pages
BTech Phase 4 Presentation Template
No ratings yet
BTech Phase 4 Presentation Template
24 pages
Research Paper Group 9
No ratings yet
Research Paper Group 9
4 pages
Final Report
No ratings yet
Final Report
25 pages
Synopsis (Group 6)
No ratings yet
Synopsis (Group 6)
4 pages
BT40816 Project Report
No ratings yet
BT40816 Project Report
34 pages
Mini Project 1
No ratings yet
Mini Project 1
13 pages
Enhanced Multiple Disease Prediction System: Ponmagas@srmist - Edu.in
No ratings yet
Enhanced Multiple Disease Prediction System: Ponmagas@srmist - Edu.in
6 pages
Medpredict-Lr - Disease Forecasting and Prevention System
No ratings yet
Medpredict-Lr - Disease Forecasting and Prevention System
7 pages
Base Paper
No ratings yet
Base Paper
4 pages
Saniya LT L
No ratings yet
Saniya LT L
23 pages
Impact of Trauma on Student Learning
No ratings yet
Impact of Trauma on Student Learning
5 pages
DeepeningTraining ElmDance
No ratings yet
DeepeningTraining ElmDance
2 pages
Thailand eCTD Specification Guide
No ratings yet
Thailand eCTD Specification Guide
35 pages
Essays On Cancer
100% (2)
Essays On Cancer
8 pages
STA Membership Handbook
No ratings yet
STA Membership Handbook
57 pages
1 - Pharmaceutical Industry - Javier Torrejon - Galenicum
No ratings yet
1 - Pharmaceutical Industry - Javier Torrejon - Galenicum
45 pages
CH 3 Institutional Support To International Business
No ratings yet
CH 3 Institutional Support To International Business
77 pages
Confidentiality
No ratings yet
Confidentiality
32 pages
1205 ITPC CLM Design FullReport06 Compressed
No ratings yet
1205 ITPC CLM Design FullReport06 Compressed
72 pages
CHPS
No ratings yet
CHPS
10 pages
Hepatitis B Immunisation For
No ratings yet
Hepatitis B Immunisation For
89 pages
Oshika Gajbhiye Resume.
No ratings yet
Oshika Gajbhiye Resume.
2 pages
Tribal-Sub-STP-Technical Know-How For Spirulina-23-12-2021
No ratings yet
Tribal-Sub-STP-Technical Know-How For Spirulina-23-12-2021
2 pages
Bovine Mastitis
No ratings yet
Bovine Mastitis
31 pages
School Counseling Program Overview
No ratings yet
School Counseling Program Overview
2 pages
Aravind Eye Care System
No ratings yet
Aravind Eye Care System
1 page
Binder1 PDF
No ratings yet
Binder1 PDF
206 pages
Pocket Milk
100% (1)
Pocket Milk
265 pages
Checklist For Effect of Posture and Exercise On BP and HR
No ratings yet
Checklist For Effect of Posture and Exercise On BP and HR
2 pages
Multilingual Hospitality Expert Resume
No ratings yet
Multilingual Hospitality Expert Resume
3 pages
Lessons in Doctoring
No ratings yet
Lessons in Doctoring
6 pages
Brochure Non Teaching Posts1
No ratings yet
Brochure Non Teaching Posts1
9 pages
Informative Speech Outline 12-03-2016 Jack Stout
No ratings yet
Informative Speech Outline 12-03-2016 Jack Stout
2 pages
Job Safety Analysis (Jsa)
No ratings yet
Job Safety Analysis (Jsa)
2 pages

Report

Uploaded by

Report

Uploaded by

A

Mr./Ms.: - Doge Onkar

DEPARTMENT OF COMPUTER ENGINEERING

DEPARTMENT OF COMPUTER ENGINEERING

Mr./Ms. Doge Onkar Somling Exam No: T1904304258

Prof. P.R. Dongare Prof. S. N. Shelke

Dr. K.P. Patil

(Student Name & Signature)

Keywords: Disease Diagnosis, Symptom Analysis, Correlation Matrix, Symptom

3. Internship Offer and Completion Certificate(Two Scanned Copies of 6-7

6. Title and Problem Statement 10

7. Motivation and Scope 12

9. Methodologies Used 14-15

10. Hardware and Software Used 16

11. Results 17-24

12. Future Scope 25

14. References 27-28

Company background-Organization Unified Mentor Pvt. Ltd.

Activities/Scope Disease prediction based on symptoms

Objectives of Study Data Science and Libraries of it

Supervisor Details (Name, Bhautik Khunt

HR Details (Name, Drishti Madaan

Director Details (Name, Paras Grover

Diagnosing diseases based on symptoms alone is a challenging task due to symptom

Title: Disease Prediction Based on Symptoms

Problem Statement: To create a system that uses patient-

1. To study and analyze the existing challenges in symptom-based disease diagnosis

2. To collect and preprocess a comprehensive dataset of symptoms and associated

3. To apply Exploratory Data Analysis (EDA) techniques to identify correlations

4. To implement feature selection techniques to identify the most significant

5. To design and develop multiple machine learning models (Logistic Regression,

6. To compare and evaluate the performance of different ML models using metrics

7. To reduce data dimensionality using Principal Component Analysis (PCA) for

8. To visualize co-occurrence networks of symptoms to better understand their

This project is motivated by the need to improve diagnostic accuracy, reduce

In today's fast-paced world, healthcare systems face increasing pressure to provide

Step 1: Data Loading & Preprocessing

Step 2: Exploratory Data Analysis (EDA)

Step 3: Feature Selection

Step 4: Model Training

Step 5: Model Evaluation

Step 6: Visual Interpretation

Step 1: Loading and Data Preparation

Step 2: Exploratory Data Analysis

Using a preprocessed dataset, EDA enables understanding patterns in symptom-

Step 4: Model Selection and Training

Step 5: Model Evaluation

Processor Intel i5 Processor

Tools Python (Jupyter Notebook)

Frequency Distribution of Diseases

Most common diseases

This is a Principal Component Analysis plot applied to a dataset of symptoms where

A co-occurrence network of symptoms, where edges between nodes denote

Model Accuracy Comparison

1. Integration with Real-Time Healthcare Systems

2. Development of a Mobile or Web-Based Application

3. Expansion of Dataset and Inclusion of Demographic Details

4. Multilingual and Voice-Based Input Support

Implementing NLP (Natural Language Processing) capabilities will allow users

5. Collaboration with Healthcare Institutions and Startups

The project majorly consisted of rigorous data preprocessing, including cleaning,

1) Author: Smith, J. (2020). Analysis of Symptom-Disease Correlation,

Line No: 45–52.

Doe emphasizes the critical role of data pre-processing in healthcare

This paper presents an automated disease diagnosis model using machine

7) Author: Unknown, Human Symptoms–Disease

Network Line No : 50–55.

This research utilizes medical bibliographic records to generate a symptom-

You might also like