Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views15 pages

HEART

This document presents a project on predicting heart disease using machine learning techniques, focusing on the accuracy of various algorithms like k-NN, Decision Tree, Logistic Regression, and SVM. It emphasizes the importance of early detection and aims to provide a reliable model for real-life medical applications, allowing users to input health data for instant predictions. The project includes an analysis of model performance, system requirements, and source code for implementation.

Uploaded by

Jaideep A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views15 pages

HEART

This document presents a project on predicting heart disease using machine learning techniques, focusing on the accuracy of various algorithms like k-NN, Decision Tree, Logistic Regression, and SVM. It emphasizes the importance of early detection and aims to provide a reliable model for real-life medical applications, allowing users to input health data for instant predictions. The project includes an analysis of model performance, system requirements, and source code for implementation.

Uploaded by

Jaideep A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

HEART DISEASE PREDICTION

USING MACHINE LEARNING

SUBMITTED BY
K. PRAVEEN
(227Z1A0582)
1
INDEX

S.NO TOPIC PAGENO


1. Introduction 3
2. Algorithm Descriptions 4

3. Benefits & Objectives 5

4. High- & Low-level 6


requirements

5. System Requirements 7

6. Source Code 8
7. Result 12

8. Conclusion 15

2
Introduction

Heart disease is one of the leading causes of death across the world,
affecting millions of people every year. According to the World
Health Organization, it is responsible for a large number of health
complications and fatalities. Early detection and timely diagnosis play
a major role in reducing the risk and planning proper treatment. In
recent years, machine learning (ML) has gained attention in the
healthcare field for its ability to find patterns in medical data and
support better decision-making.

This project focuses on using machine learning techniques to predict


whether a person has heart disease or not. Using a real-world heart
disease dataset that includes key medical information like age, blood
pressure, cholesterol levels, and more, the goal is to test how accurate
different ML models are at predicting heart conditions. We compare
the results of four popular classification algorithms—k-Nearest
Neighbours (k-NN), Decision Tree, Logistic Regression, and Support
Vector Machine (SVM)—with a special focus on reducing false
negatives, which are dangerous in situations where early diagnosis is
critical.

The aim of this study is not only to test accuracy but also to find the
most reliable model that could be used in real-life medical setups,
helping doctors diagnose heart conditions more quickly, safely, and
affordably.

3
Algorithm Descriptions
k-Nearest Neighbours(k-NN)
k-Nearest Neighbours is a simple, intuitive algorithm that classifies a
data point based on how its neighbours are labelled. It works by
calculating the distance between the input point and other points in
the dataset, then selecting the "k" closest neighbours. The majority
class among these neighbours determines the prediction. It doesn't
make assumptions about the data, which makes it flexible, but it can
become slow with large datasets and is sensitive to the scale of
features.
Logistic Regression
Logistic Regression is a statistical algorithm used for binary
classification problems. It calculates the probability that a given input
belongs to a particular class using the logistic (sigmoid) function.
Despite its name, it is a classification algorithm, not a regression one.
It performs well when there is a linear relationship between the
features and the outcome and is especially useful for medical and risk
prediction tasks due to its simplicity and efficiency.
Support Vector Machine (SVM)
Support Vector Machine is a powerful supervised learning algorithm
that finds the best boundary (or hyperplane) that separates data points
of different classes. It tries to maximize the margin between the two
classes for better accuracy. SVM is effective in high-dimensional
spaces and is known for its performance in both linear and non-linear
classification using kernel tricks. However, it can be computationally
expensive and harder to interpret compared to simpler models.

4
Benefits &Objectives
Benefits

 Predicts the likelihood of heart disease with high accuracy,


aiding in early diagnosis and intervention.

 Helps identify individuals at risk, enabling timely medical


attention and lifestyle changes.

 Provides a simple interface for users to input health data and


receive instant predictions.

 Compares multiple machine learning models using key metrics


and visual tools to determine the best-performing model.

Objectives
 Apply machine learning algorithms for binary classification of
heart disease data.
 Preprocess and split the dataset into training and testing sets to
ensure robust evaluation.
 Evaluate model performance using metrics such as accuracy,
confusion matrix, F1-score, and ROC-AUC.
 Focus on minimizing false negatives to ensure no cases of heart
disease are misclassified as healthy.
 Create a user-friendly interface for real-time prediction based on
user input.

5
High- & Low-level Requirements
High-level Requirements
 Input Interface
 Model Processing
 Prediction Output
 Performance Comparison
 Visual Representation

Low-level Requirements
 Dataset Handling
 Data Preprocessing
 Model Implementation
 Model Evaluation
 User Input Validation

6
SYSTEM REQUIREMENTS
Hardware
 Minimum 1 GHz processor
 1 GB RAM or more
 100 MB free storage

Software

 OS: Windows/Linux/macOS
 IDE: Jupyter, VS Code, or PyCharm
 Libraries: scikit-learn, matplotlib, NumPy, pandas

Features

 Input interface for user values (via console or web-based)


 Classification using multiple machine learning algorithms (e.g.,
k-NN, Decision Tree, Logistic Regression, SVM)
 Accuracy and ROC-AUC score reporting
 ROC curve plotting for visual performance comparison
 Real-time prediction for custom input values (health data)

SOURCE CODE
7
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix,
accuracy_score, roc_curve, auc
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

# Load dataset
df = pd.read_csv("heart.csv")

# Features and target


X = df.drop("target", axis=1)
y = df["target"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

8
X_test_scaled = scaler.transform(X_test)

# Define models
models = {
"KNN": KNeighborsClassifier(),
"Decision Tree": DecisionTreeClassifier(random_state=42),
"SVM": SVC(probability=True, random_state=42)
}

# Train and evaluate models


roc_data = {}
for name, model in models.items():
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]

print(f"\n=== {name} ===")


print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test,
y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))

fpr, tpr, _ = roc_curve(y_test, y_prob)


roc_auc = auc(fpr, tpr)
roc_data[name] = (fpr, tpr, roc_auc)

9
# Plot ROC curves
plt.figure(figsize=(10, 6))
for name, (fpr, tpr, roc_auc) in roc_data.items():
plt.plot(fpr, tpr, label=f"{name} (AUC = {roc_auc:.2f})")
plt.plot([0, 1], [0, 1], 'k--', label='Random Guess')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve Comparison")
plt.legend()
plt.grid(True)
plt.show()

# Ask input from the user


print("\nPlease enter the following details for prediction:")

age = int(input("Age: "))


sex = int(input("Sex (1 = male, 0 = female): "))
cp = int(input("Chest pain type (0-3): "))
resting_blood_pressure = int(input("Resting blood pressure: "))
serum_cholesterol = int(input("Serum cholesterol (mg/dl): "))
fasting_blood_sugar = int(input("Fasting blood sugar > 120 mg/dl (1
= yes, 0 = no): "))

# Prepare input for prediction

10
sample_input = [[age, sex, cp, resting_blood_pressure,
serum_cholesterol, fasting_blood_sugar, 0, 0, 0, 0, 0, 0, 0]] # other
inputs can be set to 0 or a default value

# Scale the input using the same scaler as training data


sample_input_scaled = scaler.transform(sample_input)

# Predict with all models


print("\n--- Sample Prediction from All Models ---")
for name, model in models.items():
prediction = model.predict(sample_input_scaled)[0]
result = "Heart Disease Detected" if prediction == 1 else "No Heart
Disease Detected"
print(f"{name}: {result}")

11
OUTPUT

12
13
14
CONCLUSION
The heart disease prediction system demonstrates how machine
learning can effectively assist in the early diagnosis of cardiovascular
conditions. By applying multiple classification algorithms and
comparing their performance, the project identifies the most reliable
model for accurate predictions. The system allows users to input basic
medical data and receive real-time risk assessment, making it a
valuable tool for both patients and healthcare providers. With its focus
on reducing false negatives, the model aims to ensure that at-risk
individuals are not overlooked, ultimately supporting timely
intervention and better health outcomes.

15

You might also like