Final Year Report Final
Final Year Report Final
Machine Learning
A PROJECT REPORT
Submitted by
April 2024
I
VELAMMAL ENGINEERING COLLEGE
CHENNAI -66
BONAFIDE CERTIFICATE
Certified that this project report “Heart Attack Risk Prediction using Machine Learning” is the
bonafide work of “PRAMODH KUMAR P(113219031098),PUGAZHENDI B(113219031168),
SANTHOSH KUMAR S(113219031168)” who carried out the project work under my supervision.
II
CERTIFICATE OF EVALUATION
SI.NO Name of the students who Title of the Project Name of supervisor
has done the project With designation
This report of Project work submitted by the above students in the partial fulfillment for the award
of Bachelor of Engineering Degree in Anna University was evaluated and confirmed to be reports
of the work by the above student and then assessed.
Internal Examiner
External Examiner
III
ABSTRACT
Heart disease remains a leading cause of mortality worldwide, underscoring the need for effective
predictive tools to mitigate its impact. Machine learning (ML) techniques offer promising avenues
for predicting heart attacks by leveraging diverse patient data. This abstract presents the
development of a predictive model for heart attacks implemented through a user-friendly Streamlit
application.The proposed system integrates ML algorithms with an intuitive user interface, allowing
healthcare professionals and individuals to assess their risk of experiencing a heart attack.
Leveraging datasets comprising various demographic, clinical, and lifestyle factors, the model
employs feature selection, preprocessing techniques, and ensemble learning to optimize predictive
accuracy.The Streamlit application provides a seamless platform for users to input their personal
data, such as age, gender, blood pressure, cholesterol levels, and lifestyle habits. Through
interactive visualizations and user-friendly controls, individuals gain insights into their
cardiovascular health and potential risk factors.further more, the application's backend utilizes ML
algorithms, including logistic regression, decision trees, and support vector machines, to analyze
input data and generate personalized risk assessments. By harnessing the power of ML, the model
continuously learns from new data, enhancing its predictive capabilities and adaptability over time.
IV
ACKNOWLEDGEMENT
I wish to acknowledge with thanks, the significant contribution given by the management
of our college Chairman, Dr.M.V. Muthuramalingam, and our Chief Executive Officer Thiru.
M.V.M. Velmurugan, for their extensive support.
I would like to thank Dr. S. Satish Kumar, Principal of Velammal Engineering College,
for giving me this opportunity to do this project.
I express my thanks to our Project Coordinators, Dr. P. S. Smitha, Dr. P. Pritto Paul, and
Dr. S. Rajalakshmi, Department of Computer Science and Engineering for their invaluable
guidance in the shaping of this project.
I am grateful to the entire staff members of the Department of Computer Science and
Engineering for providing the necessary facilities and carrying out the project. I would especially
like to thank my parents for providing me with the unique opportunity to work and for their
encouragement and support at all levels. Finally, my heartfelt thanks to The Almighty for guiding
me throughout my life.
V
TABLE OF CONTENTS
ABSTRACT IV
LIST OF FIGURES X
1 INTRODUCTION 1
2 LITERATURE SURVEY 8
2.1 INTRODUCTION 8
VI
2.3 APPLYING CODEBERT AUTOMATED 09
PROGRAM REPAIR OF JAVA SIMPLE BUGS
3 SYSTEM ANALYSIS 11
4 SYSTEM SPECIFICATION 12
4.1.3 PYTHON 14
4.1.5 STREAMLIT 15
VII
5 SYSTEM DESIGN 17
6 SYSTEM IMPLEMENTATION 27
6.1 MODULES 27
7 TESTING 32
7.1 INTRODUCTION 32
VIII
7.2 TESTING OBJECTIVES 32
8.1 CONCLUSION 35
APPENDIX II - SNAPSHOTS 42
REFERENCES 44
IX
LIST OF FIGURES
X
6.2.3 Data analysis 30
XI
LIST OF ABBREVIATIONS
AI Artificial Intelligence
ML Machine Learning
PL Programming Language
NL Natural Language
XII
CHAPTER 1
INTRODUCTION
The purpose of this project extends beyond mere prediction; it encompasses several key objectives:
Early Detection: One of the primary purposes is to detect the risk factors associated with heart attacks at
an early stage. By leveraging machine learning algorithms, we aim to identify subtle patterns and
indicators that might go unnoticed through traditional diagnostic methods.
Risk Stratification: Not all individuals have the same risk profile for heart attacks. Through this project,
we seek to stratify individuals based on their risk levels, enabling healthcare professionals to allocate
resources efficiently and tailor interventions according to the specific needs of each group.
Personalized Medicine: Every individual possesses a unique set of characteristics and risk factors. By
incorporating personalized data such as medical history, lifestyle factors, and genetic predispositions, our
goal is to develop models that offer personalized risk assessments, empowering individuals to make
informed decisions about their health.
Prevention and Intervention: Ultimately, the overarching purpose is to prevent heart attacks and mitigate
their impact. By accurately predicting the likelihood of a heart attack, healthcare providers can intervene
proactively, implementing preventive measures such as lifestyle modifications, medication, and targeted
interventions to reduce the risk and severity of cardiovascular events.
Public Health Impact: Beyond individual-level interventions, this project aims to contribute to broader
public health initiatives. By identifying population-level trends and risk factors, policymakers and
healthcare authorities can implement preventive strategies, allocate resources efficiently, and design
targeted interventions to address the root causes of CVDs at a population level.
1
1.2 SCOPE OF THE PROJECT
Machine learning (ML) has emerged as a transformative technology with profound implications across
various domains, revolutionizing how we process data, derive insights, and make decisions. Rooted in the
field of artificial intelligence (AI), machine learning algorithms enable computers to learn from data,
identify patterns, and make predictions or decisions without being explicitly programmed. This
introduction aims to provide an overview of machine learning, exploring its fundamental concepts,
methodologies, applications, and societal impact.
At its core, machine learning is concerned with the development of algorithms that improve their
performance over time as they are exposed to more data. This learning process can be broadly categorized
into three main types:
2
Fig 1.1: Machine Learning diagram
Supervised Learning: In supervised learning, the algorithm is trained on labeled data, where each input
is associated with a corresponding output. The goal is to learn a mapping from inputs to outputs, enabling
the algorithm to make predictions on unseen data. Common applications include classification (e.g., spam
detection, image recognition) and regression (e.g., predicting house prices, forecasting sales).
3
Unsupervised Learning: Unsupervised learning involves training algorithms on
unlabeled data, where the objective is to uncover hidden patterns or structures within the data.
Clustering algorithms, such as k-means and hierarchical clustering, group similar data points
together based on their features, while dimensionality reduction techniques like principal
component analysis (PCA) extract meaningful representations of the data.
Healthcare:
Finance:
The finance industry leverages machine learning for fraud detection, risk assessment, algorithmic
trading, and customer service optimization. Fraud detection algorithms analyze transaction data
in real-time, flagging suspicious activities and preventing financial losses. Risk assessment
models utilize historical data to predict market trends, assess creditworthiness, and optimize
investment portfolios. Machine learning-driven chatbots provide personalized financial advice
4
and support, enhancing customer experience and engagement.
Manufacturing:
Machine learning plays a pivotal role in enhancing efficiency, quality control, and predictive
maintenance in manufacturing processes. Predictive maintenance algorithms analyze sensor data
from machinery to predict equipment failures before they occur, reducing downtime and
maintenance costs. Quality control systems employ machine learning techniques to detect defects
in products, ensuring compliance with quality standards. Additionally, supply chain optimization
models optimize inventory management, production scheduling, and logistics, improving overall
operational efficiency.
Big Data is also data but with a huge size. Big Data is a term used to describe a collection
of data that is huge and yet growing exponentially with time. In short such data is so large and
5
complex that none of the traditional data management tools can store it or process it efficiently.
Big Data could be found in three forms:
1. Structured
2. Unstructured
3. Semi-structured
Structured: Any data that can be stored, accessed, and processed in the form of a fixed
format is termed 'structured' data. Over the period, talent in computer science has achieved greater
success in developing techniques for working with such kinds of data (where the format is well
known in advance) and also deriving value from it. However, nowadays, we are foreseeing issues
when the size of such data grows to a huge extent, typical sizes being in the range of multiple
zettabytes.
Semi-structured: Semi-structured data can contain both forms of data. We can see semi-
structured data as structured in form but it is not defined with e.g. a table definition in relational
DBMS. An example of semi-structured data is data represented in an XML file.
(i) Volume – The name Big Data itself is related to its enormous size. The size of data plays a very
crucial role in determining its value out of data. Also, whether a particular data can actually be
considered Big Data or not, is dependent upon the volume of data. Hence, 'Volume' is one
characteristic that needs to be considered while dealing with Big Data.
(ii) Variety – The next aspect of Big Data is its variety. Variety refers to heterogeneous sources
6
and the nature of data, both structured and unstructured. In earlier days, spreadsheets and databases
were the only sources of data considered by most applications.
(iii) Velocity – The term 'velocity' refers to the speed of the generation of data. How fast the data
is generated and processed to meet the demands, determines the real potential of the data. Big Data
Velocity deals with the speed at which data flows in from sources like business processes,
application logs, networks, social media sites, sensors, Mobile devices, etc. The flow of data is
massive and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times Benefits
of Big Data Processing Ability to process Big Data brings in multiple benefits, such as- Businesses
can utilize outside intelligence while taking decisions
Access to social data from search engines and sites like Facebook, and Twitter is enabling
organizations to fine-tune their business strategies. Improved customer service Traditional
customer feedback systems are getting replaced by new systems designed with Big Data
technologies. In these new systems, Big Data and natural language processing technologies are
being used to read and evaluate consumer responses. Early identification of risk to the
product/services, if any Better operational efficiency Big Data technologies can be used for
creating a staging area or landing zone for new data before identifying what data should be moved
to the data warehouse. In addition, such integration of Big Data technologies and data warehouse
helps an organization to offload infrequently accessed data.
7
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION:
The development of software applications is a complex process that involves writing and testing
code, debugging errors, and ensuring that the final product meets the desired specifications. In
recent years, there has been growing interest in the use of machine learning (ML) techniques,
particularly natural language processing (NLP), to support developers during the coding process.
Review existing research studies and articles, The survey will provide an overview of the current
state of research in this field, including the strengths and limitations of existing approaches, and
will identify areas for future research and development.
Abstract:
Modern society relies on complex software applications that consist of millions of lines
written in many programming languages (PLs) by many teams of developers. PLs are difficult to
read and understand quickly so the developers must also document their programs to make them
more maintainable. Mistakes made during the coding phase led to software bugs that can cost time
and money for the software creators and users. A popular technology used by software developers
is a “linter” which flags syntactic errors in code. Auto-formatters will add or remove whitespace
and “newline” characters to code to improve readability. Statement auto-complete tools can
suggest tokens that programmers might write next to improve their productivity. While these
traditional tools can be useful for programmers, most of them can’t help a developer with complex
tasks such as writing understandable code documentation or implementing algorithms.
8
2.3 Applying CodeBERT for Automated Program Repair of Java Simple Bugs
Author: Ehsan Mashhadi, Hadi Hemmati
Abstract:
Software debugging, and program repair are among the most time-consuming and labor-
intensive tasks in software engineering that would benefit a lot from automation. In this paper, the
authors propose a novel automated program repair approach based on CodeBERT, which is a
transformer-based neural architecture pre-trained on a large corpus of source code. The model is
fine-tuned on the ManySStuBs4J small and large datasets to automatically generate the fix codes.
The results show that our technique accurately predicts the fixed codes implemented by the
developers in 19-72% of the cases, depending on the type of datasets, in less than a second per
bug.
Abstract:
In software development through integrated development environments (IDEs), code completion
is one of the most widely used features. Nevertheless, the majority of integrated development
environments only support completion of methods and APIs, or arguments. In this paper, the
authors introduce IntelliCode Compose – a general purpose multilingual code completion tool
which is capable of predicting sequences of code tokens of arbitrary types, generating up to entire
lines of syntactically correct code. It leverages state of-the-art generative transformer model
trained on 1.2 billion lines of source code in Python, C#, JavaScript and TypeScript programming
languages. IntelliCode Compose is deployed as a cloud-based web service.
9
2.5 A Pre-Trained Model for Programming and Natural Languages
Authors: Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun
Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou
Abstract:
The authors present CodeBERT, a bimodal pre-trained model for programming language
(PL) and natural language (NL). CodeBERT learns general purpose representations that support
downstream NL-PL applications such as natural language code search, code documentation
generation, etc. CodeBERT was developed with Transformer-based neural architecture, and train
it with a hybrid objective function that incorporates the pre-training task of replaced token
detection, which is to detect plausible alternatives sampled from generators. This enables us to
utilize both “bimodal” data of NLPL pairs and “unimodal” data, where the former provides input
tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT
on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT
achieves state-of-the-art performance on both natural language code search and code
documentation generation.
10
CHAPTER 3
SYSTEM ANALYSIS
Data Sources:
Existing systems leverage diverse data sources, including electronic health records (EHRs), medical
imaging, laboratory tests, genetic information, lifestyle factors, and demographic data.Clinical datasets
from hospitals, healthcare institutions, research databases, and public health repositories serve as valuable
sources for training and validating predictive models.
11
CHAPTER 4
SYSTEM SPECIFICATION
Visual Studio Code (VS Code) is a free and open-source code editor developed by Microsoft. It is
designed to be lightweight, fast, and customizable, making it a popular choice for developers
across a variety of programming languages and platforms. One of the key features of VS Code is
its support for extensions, which allows developers to add functionality and customize the editor
to suit their needs. There are thousands of extensions available, ranging from language-specific
syntax highlighting and autocomplete to tools for debugging, testing, and version control. In
addition to its extensibility, VS Code also includes a range of built-in features to support efficient
coding, such as IntelliSense, which provides context-aware code completion and suggestion, and
a built-in terminal for running commands and scripts. VS Code also includes support for debugging
and profiling, with integrated tools for debugging Node.js, Python, and other languages, as well as
the ability to attach to remote processes for debugging and profiling in a distributed environment.
VS Code is a powerful and versatile code editor that provides a range of features and tools to
support efficient and effective coding. Its popularity among developers is a testament to its
usefulness and flexibility.
12
4.1.2 ANACONDA DISTRIBUTION
Anaconda Distribution stands as one of the most comprehensive and widely used platforms for data
science and machine learning. With its extensive collection of tools, libraries, and packages, Anaconda
simplifies the process of setting up and managing environments for data analysis, statistical modeling, and
AI development. Let's delve into the details of Anaconda Distribution, its components, features, and
benefits.
Components of Anaconda Distribution:
Conda Package Manager:
At the heart of Anaconda Distribution lies Conda, a powerful package manager and environment
manager.Conda facilitates package installation, dependency management, and environment creation
across different operating systems.It enables users to seamlessly install, update, and manage packages and
libraries without worrying about compatibility issues or conflicts.
Python Interpreter:
Anaconda Distribution comes bundled with the Python programming language, providing users with a
robust and versatile platform for data analysis, scripting, and application development.Python's rich
ecosystem of libraries, including NumPy, Pandas, SciPy, Matplotlib, and scikit-learn, makes it an ideal
choice for data science and machine learning projects.
Jupyter Notebook:
Anaconda includes Jupyter Notebook, an interactive computing environment that allows users to create
and share documents containing live code, visualizations, and explanatory text.Jupyter Notebooks support
multiple programming languages, including Python, R, and Julia, making it a versatile tool for data
exploration, prototyping, and collaborative research.
Spyder IDE:
Anaconda Distribution features Spyder, an Integrated Development Environment (IDE) tailored for
scientific computing and data analysis.Spyder provides a user-friendly interface with features such as
code editing, debugging, variable exploration, and integrated IPython console, enhancing productivity for
data scientists and researchers.
Libraries and Packages:
Anaconda Distribution comes pre-installed with a vast array of essential libraries and packages for
data science, machine learning, and scientific computing.These include popular libraries like NumPy,
Pandas, SciPy, Matplotlib, scikit-learn, TensorFlow, PyTorch, Keras, and many others.
13
4.1.3 PYTHON
Python is a versatile and powerful programming language renowned for its simplicity, readability, and
extensive ecosystem of libraries and frameworks. Developed by Guido van Rossum in the late 1980s,
Python has gained widespread adoption across various domains, including web development, data
science, machine learning, artificial intelligence, and scientific computing.Key features of Python include
its clean and intuitive syntax, dynamic typing, automatic memory management, and strong support for
object-oriented, functional, and procedural programming paradigms. Python's extensive standard library
provides modules for tasks ranging from file I/O and networking to web development and GUI
programming.Moreover, Python's vibrant community and open-source ethos have led to the creation of
numerous third-party libraries and frameworks, such as NumPy, Pandas, Matplotlib, TensorFlow, Django,
Flask, and scikit-learn, which further enhance its capabilities and make it a preferred choice for
developers worldwide.
Kaggle datasets offer a treasure trove of valuable data resources for data scientists, machine learning
practitioners, and researchers worldwide. As one of the largest platforms for data science competitions
and collaborative projects, Kaggle hosts a diverse collection of datasets spanning multiple domains,
including healthcare, finance, social sciences, biology, and more. Kaggle datasets range from small,
curated datasets suitable for learning and experimentation to large-scale, real-world datasets sourced from
industry partners, government agencies, and research institutions. These datasets cover a wide range of
topics, such as image classification, natural language processing, time series analysis, predictive
modeling, and anomaly detection.One of the key strengths of Kaggle datasets is their accessibility and
usability. Users can easily explore, download, and analyze datasets directly from the Kaggle platform,
leveraging tools like Jupyter notebooks and Kaggle Kernels for data exploration, visualization, and
modeling. Additionally, Kaggle fosters a collaborative community where users can share insights, discuss
methodologies, and collaborate on projects, further enriching the learning and discovery experience.
14
4.1.5 STREAMLIT
Streamlit is a powerful Python library that simplifies the process of building interactive web applications
for data science and machine learning projects. With its intuitive and user-friendly interface, Streamlit
allows developers to create data-driven applications with minimal code, enabling rapid prototyping and
deployment. Streamlit seamlessly integrates with popular data science libraries such as Pandas,
Matplotlib, and scikit-learn, enabling users to create dynamic visualizations, interactive dashboards, and
machine learning models with ease. Whether you're a data scientist, researcher, or developer, Streamlit
empowers you to share insights, engage stakeholders, and deploy your projects seamlessly, making it an
invaluable tool for the data science community.
A machine learning language model is a type of machine learning algorithm that is specifically
designed to process and analyze natural language data, such as text or speech. These models are
typically trained on large datasets of natural language data, and use statistical and probabilistic
methods to learn patterns and relationships in the data.
One common type of machine learning language model is the neural language model, which uses
artificial neural networks to process and analyze language data. These models are typically trained
on large amounts of text data, such as books, articles, or web pages, and are designed to predict
the probability of a particular word or sequence of words given the context of the surrounding text.
Neural language models are commonly used for a variety of natural language processing tasks,
such as language translation, sentiment analysis, and speech recognition. They are also
increasingly being used for coding assistance and support, with developers using ML language
models to provide suggestions, recommendations, and other forms of assistance during the coding
process.
15
4.2 HARDWARE SPECIFICATION
Hardware environment refers to the physical components that make up a computer system,
including the processor, memory, storage, input/output devices, and other components. The
hardware environment can have a significant impact on the performance and capabilities of a
computer system and is an important consideration when choosing a system for a specific
application.
16
CHAPTER 5
SYSTEM DESIGN
User interface: This is the interface that developers use to interact with the coding companion. It
typically includes features such as code suggestions, error detection and correction, and code
completion.
The back-end component is responsible for processing the code being written by the developer
and providing suggestions and feedback. It uses ML algorithms and techniques to understand the
context of the code and provide relevant suggestions.
The ML model is the core component of the coding companion and is responsible for
understanding the context of the code being written by the developer. This component is trained
on a large corpus of programming language code and is optimized for specific programming
languages and frameworks.
An API layer can be used to facilitate communication between the front-end and back-end
components of the coding companion. This layer can be implemented using a variety of
technologies, such as REST APIs or WebSockets.
17
5.2 UML DIAGRAMS
Registration: The main class that encapsulates the system and holds references to its
components.
Input Processor: A class responsible for processing the user input, performing code analysis, and
extracting relevant information.
Output Processor: A class responsible for processing the output generated by the ML model,
filtering, and formatting the suggestions for display.
MLModel: A class responsible for training and making predictions using the ML algorithm.
Each class has its own set of methods that are specific to its responsibilities. The Coding
Companion class has methods to execute the system, retrieve user input, display suggestions, train
the model, and update user preferences. The Input Processor class has a method for performing
code analysis, and the Output Processor class has a method for filtering the suggestions. The ML
Model class has methods for training the model using training data and making predictions using
the trained model.
18
5.2.1 USE CASE DIAGRAMS:
Additionally, we have identified four main user actions that interact with the system:
1. Enter code: The user enters code to receive suggestions.
2. View suggestions: The user views the suggestions generated by the system.
3. Train model: The user provides training data to train the ML model.
4. Update preferences: The user updates their preferences to personalize the suggestions.
Finally, we have identified four main system actions that are performed by the system to support
the use cases:
1. Process user input: The system processes the user input to generate suggestions.
2. Retrieve suggestions: The system retrieves relevant suggestions based on the user input.
3. Train ML model: The system uses the training data provided by the user to train the ML
model.
4. Update preferences: The system uses the user preferences to personalize the suggestions
generated for the user.
19
Fig 5.2.1: Use Case diagram
1. Code: This class represents the code entered by the user. It has a code string, a language,
and a list of code suggestions generated by the system.
2. CodeSuggestion: This class represents a code suggestion generated by the system. It has
a suggestion string and a confidence score.
3. MLModel: This class represents the ML model used to generate code suggestions. It has
a list of training data and methods to train the model and generate code suggestions.
4. TrainingData: This class represents the training data used to train the ML model. It has
an input and an output.
20
Fig 5.2.2: Class diagram
21
Fig 5.2.3: Sequence diagram
22
Code object. The Coding Companion updates its internal state with the updated Code
object and displays the updated code to the user
23
6. The ML Model updates its internal state with the new information and returns the updated
output object.
The Machine learning model updates its internal state with the updated data object and
displays the output to the user.
24
5.2.6 DATA FLOW DIAGRAM:
25
5.2.6.2 LEVEL 1 DATA FLOW DIAGRAM
In this level 1 DFD, we have added more detail to each component of the coding companion
system:
User Interface: The user interface component captures user input and displays suggestions and
feedback.
Input Processor: The input processor component receives user input, preprocesses it, and
performs code analysis to extract relevant information.
Code Analysis: This process analyzes the code to extract relevant information that can be used to
improve the suggestions and feedback generated by the ML model.
ML Model: This component performs the prediction task based on the training data.
Model Training: This process trains the ML model using relevant data.
Output Processor: The output processor component receives the suggestions and feedback
generated by the ML model, filters, prioritizes and formats them for display.
Data Store: This component stores data related to the coding companion system, such as user
preferences, code history, and ML model parameters.
26
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 MODULES
● User Interface
● Machine Learning Language Model
● Data Analysis
● Natural Language Processing
The user interface (UI) in a coding companion using ML language model is necessary to provide
a seamless and user-friendly experience for the user. Here are some reasons why a UI is needed in
a coding companion using the ML language model. A well-designed UI can make it easier for the
user to interact with the application and make use of its features. The UI can be designed to guide
the user through the various functions and tools of the application.
27
sections of the application and access the features they need. This can improve the user experience
and increase the efficiency of the coding process.
Machine learning (ML) plays a crucial role in a coding companion using ML language model. The
ML model can analyze the user's code and suggest improvements or alternative solutions. This can
help the user to write more efficient and effective code. The ML model can detect errors in the
user's code and suggest corrections. This can save the user time and effort in debugging their code.
The ML model can use contextual information to provide more relevant suggestions. For example,
it can suggest code snippets that are relevant to the programming language, library or framework
being used. The ML model can learn from the user's coding style and preferences, and provide
personalized suggestions that are tailored to their needs. The ML model can use natural language
processing (NLP) techniques to understand the user's code comments and provide relevant
suggestions.
28
Fig 6.2.2: Machine Learning Language Model
The data analysis component is responsible for analyzing the data given by the user to identify
any issues such as syntax errors, logical errors, and other potential problems. The data analysis
process consists of three main stages: lexical analysis, syntax analysis, and semantic analysis. The
Lexical Analysis component is responsible for breaking the code down into its fundamental
building blocks, or tokens, such as keywords, identifiers, and literals. This stage involves removing
any whitespace or comments from the code. The Syntax Analysis component is responsible for
analyzing the structure of the code and checking that it conforms to the rules of the programming
language being used. This involves checking that the code is syntactically correct and free from
any syntax errors. The Semantic Analysis component is responsible for analyzing the meaning of
the code and checking that it conforms to the rules of the programming language being used. This
involves checking that the code is semantically correct and free from any logical errors.
29
Fig 6.2.3: data analysis
The NLP component is responsible for processing the natural language input of the user to identify
the intent and provide appropriate responses. The NLP process consists of several stages, including
text preprocessing, text representation, model training, and model inference. The Text
Preprocessing component is responsible for cleaning and normalizing the user's natural language
input. This involves removing any irrelevant information, such as stopwords and punctuation, and
converting the text to a standardized format. The Text Representation component is responsible
for converting the preprocessed text into a format that can be understood by the machine learning
models. This typically involves vectorizing the text into a numerical format, such as a bag-of-
words or word embeddings. The Model Training component is responsible for training the
machine learning models on a dataset of labeled examples. The models can be trained using various
algorithms, such as decision trees, random forests, or neural networks. The Model Inference
component is responsible for applying the trained machine learning models to the user's natural
language input and providing appropriate responses. This involves predicting the intent of the
user's input and generating a response based on the predicted intent.
30
Fig 6.2.4: Natural Language Processing
31
CHAPTER 7
TESTING
7.1 INTRODUCTION
● The primary objective of testing is to find defects or errors in the software, product, or
system being tested. This includes identifying bugs, issues, and other problems that can
affect the performance, functionality, and user experience of the software.
● Testing is also used to ensure that the software or system being tested performs the
functions and operations it is designed to do. This includes verifying that all features and
components work as expected and meet the specified requirements.
● Testing can also help improve the overall quality of the software or system by identifying
areas for improvement and suggesting ways to optimize performance, usability, and
security.
● Testing can help reduce the risk of errors, failures, and security breaches by identifying
potential issues before they become critical problems. This can help prevent downtime,
data loss, and other negative consequences.
32
7.3 TYPES OF TESTING
Unit testing is a type of software testing that focuses on testing individual units or components of
code. In unit testing, each unit of code is tested in isolation to verify that it behaves as expected
and meets the specified requirements. The main objective of unit testing is to ensure that each unit
of code works correctly and integrates smoothly with other units of code. This helps to identify
any defects or errors early in the development cycle, before they become more difficult and
expensive to fix.
Features to be tested:
● Verify that the entries are of the correct format.
● No duplicate entries should be allowed.
● All links should take the user to the correct page.
Integration testing is a type of software testing that focuses on verifying the interfaces and
interactions between different modules or components of a software system. The goal of
integration testing is to identify defects and issues that may arise when different modules or
components are combined and to ensure that they work together as expected. In integration testing,
individual units or modules are combined and tested as a group to verify that they work together
33
correctly. This involves testing the interfaces and interactions between modules, as well as the
functionality of the system as a whole.
Integration testing can be performed using a variety of techniques, including top-down, bottom-
up, or a combination of both. In top-down integration testing, higher-level modules are tested first,
followed by lower-level modules. In bottom-up integration testing, lower-level modules are tested
first, followed by higher-level modules. In a combination approach, integration testing is
performed in stages, with some modules tested together and others tested individually.Integration
testing helps to identify defects early in the development cycle, when they are less expensive and
easier to fix. Integration testing helps to ensure that the software components or modules work
correctly and integrate smoothly, improving overall software quality. Integration testing helps to
increase confidence in the software by verifying that it meets the specified requirements and
performs as expected.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
34
CHAPTER 8
8.1 CONCLUSION
It is concluded that system will works well and thus it will fulfill the end user’s requirement. The
system is tested and errors are accurately removed. Heart Disease is one of the leading causes of death
worldwide and the early prediction of heart disease is important. The project aims on predicting the
heart disease by using the KNN Algorithm with the help of android application. The probability of
disease is found by using certain datasets and the input given by the user, it also gives the nearby
hospital details and notices the patient about the disease by using messaging application.
35
APPENDIX-1
SOURCE CODE
App.py
import os
import pickle
import streamlit as st
from streamlit_option_menu import option_menu
['Diabetes Prediction',
'Heart Disease Prediction',
'Parkinsons Prediction'],
menu_icon='hospital-fill',
icons=['activity', 'heart', 'person'],
default_index=0)
# page title
st.title('Diabetes Prediction using ML')
36
# getting the input data from the user
col1, col2, col3 = st.columns(3)
with col1:
Pregnancies = st.text_input('Number of Pregnancies')
with col2:
Glucose = st.text_input('Glucose Level')
with col3:
BloodPressure = st.text_input('Blood Pressure value')
with col1:
SkinThickness = st.text_input('Skin Thickness value')
with col2:
Insulin = st.text_input('Insulin Level')
with col3:
BMI = st.text_input('BMI value')
with col1:
DiabetesPedigreeFunction = st.text_input('Diabetes Pedigree Function value')
with col2:
Age = st.text_input('Age of the Person')
diab_prediction = diabetes_model.predict([user_input])
if diab_prediction[0] == 1:
diab_diagnosis = 'The person is diabetic'
else:
diab_diagnosis = 'The person is not diabetic'
st.success(diab_diagnosis)
37
# Heart Disease Prediction Page
if selected == 'Heart Disease Prediction':
# page title
st.title('Heart Disease Prediction using ML')
with col1:
age = st.text_input('Age')
with col2:
sex = st.text_input('Sex')
with col3:
cp = st.text_input('Chest Pain types')
with col1:
trestbps = st.text_input('Resting Blood Pressure')
with col2:
chol = st.text_input('Serum Cholestoral in mg/dl')
with col3:
fbs = st.text_input('Fasting Blood Sugar > 120 mg/dl')
with col1:
restecg = st.text_input('Resting Electrocardiographic results')
with col2:
thalach = st.text_input('Maximum Heart Rate achieved')
with col3:
exang = st.text_input('Exercise Induced Angina')
with col1:
oldpeak = st.text_input('ST depression induced by exercise')
with col2:
slope = st.text_input('Slope of the peak exercise ST segment')
with col3:
ca = st.text_input('Major vessels colored by flourosopy')
with col1:
thal = st.text_input('thal: 0 = normal; 1 = fixed defect; 2 = reversable defect')
38
# code for Prediction
heart_diagnosis = ''
user_input = [age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal]
heart_prediction = heart_disease_model.predict([user_input])
if heart_prediction[0] == 1:
heart_diagnosis = 'The person is having heart disease'
else:
heart_diagnosis = 'The person does not have any heart disease'
st.success(heart_diagnosis)
# page title
st.title("Parkinson's Disease Prediction using ML")
with col1:
fo = st.text_input('MDVP:Fo(Hz)')
with col2:
fhi = st.text_input('MDVP:Fhi(Hz)')
with col3:
flo = st.text_input('MDVP:Flo(Hz)')
with col4:
Jitter_percent = st.text_input('MDVP:Jitter(%)')
with col5:
Jitter_Abs = st.text_input('MDVP:Jitter(Abs)')
with col1:
RAP = st.text_input('MDVP:RAP')
with col2:
PPQ = st.text_input('MDVP:PPQ')
39
with col3:
DDP = st.text_input('Jitter:DDP')
with col4:
Shimmer = st.text_input('MDVP:Shimmer')
with col5:
Shimmer_dB = st.text_input('MDVP:Shimmer(dB)')
with col1:
APQ3 = st.text_input('Shimmer:APQ3')
with col2:
APQ5 = st.text_input('Shimmer:APQ5')
with col3:
APQ = st.text_input('MDVP:APQ')
with col4:
DDA = st.text_input('Shimmer:DDA')
with col5:
NHR = st.text_input('NHR')
with col1:
HNR = st.text_input('HNR')
with col2:
RPDE = st.text_input('RPDE')
with col3:
DFA = st.text_input('DFA')
with col4:
spread1 = st.text_input('spread1')
with col5:
spread2 = st.text_input('spread2')
with col1:
D2 = st.text_input('D2')
with col2:
PPE = st.text_input('PPE')
parkinsons_prediction = parkinsons_model.predict([user_input])
if parkinsons_prediction[0] == 1:
parkinsons_diagnosis = "The person has Parkinson's disease"
else:
parkinsons_diagnosis = "The person does not have Parkinson's disease"
st.success(parkinsons_diagnosis)
41
APPENDIX-2
SNAPSHOT
42
Fig 9.3: person does not have a heart attack
43
REFERENCES
2. https://www.researchgate.net/publication/350311935_Applying_CodeBERT_for_Au
tomated_Program_Repair_of_Java_Simple_Bugs
3. Kuhail, M.A., Alturki, N., Alramlawi, S. et al. Interacting with educational chatbots:
A systematic review. Educ Inf Technol 28, 973–1018 (2023).
https://doi.org/10.1007/s10639-022-11177-3
44
PROGRAM OUTCOMES
PO Graduate
Program Outcomes (POs)
No. Attribute
Apply the knowledge of mathematics, science, engineering
Engineering
PO1 fundamentals, and an engineering specialization for the
knowledge
solution of complex engineering problems.
Identify, formulate, research literature, and analyze complex
engineering problems reaching substantiated conclusions
PO2 Problem analysis
using first principles of mathematics, natural sciences, and
engineering sciences.
Design solutions for complex engineering problems and
Design/ design system components or processes that meet the
PO3 development of specified needs with appropriate consideration for public
solutions health and safety, and cultural, societal, and environmental
considerations.
Use research-based knowledge and research methods
Conduct
including design of experiments, analysis and interpretation
PO4 investigations of
of data, and synthesis of the information to provide valid
complex Problems
conclusions.
Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools, including prediction
PO5 Modern tool usage
and modeling to complex engineering activities, with an
understanding of the limitations.
Apply reasoning informed by the contextual knowledge to
The engineer and assess societal, health, safety, legal and cultural issues and the
PO6
society consequent responsibilities relevant to the professional
engineering practice.
45
Understand the impact of the professional engineering
Environment and solutions in societal and environmental contexts, and
PO7
Sustainability demonstrate the knowledge of, and need for sustainable
development.
Apply ethical principles and common to professional ethics
PO8 Ethics
and responsibilities and norms of the engineering practice.
Individual and Function effectively as an individual, and as a member or
PO9
team work leader in diverse teams, and in multidisciplinary settings.
Communicate effectively on complex engineering activities
with the engineering community and with the society at large,
PO10 Communication such as, being able to comprehend and withe effective reports
and design documentation, make effective presentations, and
give and receive clear instructions.
Demonstrate knowledge and understanding of the engineering
Project
and management principles and apply these to one’s own
PO11 management and
work, as a member and leader in a team, to manage projects
finance
and in multidisciplinary environments.
Recognize the need for, and have the preparation and ability
PO12 Life-long learning to engage in independent and life-long learning in the broadest
context of technological change.
Mapping of Program outcomes with the Project titled “HEART ATTACK RISK
PREDICTION USING MACHINE LEARNING.”
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
46
PROGRAM SPECIFIC OUTCOMES
B.E COMPUTER SCIENCE AND ENGINEERING
Signature of Guide
47