0% found this document useful (0 votes)

141 views15 pages

Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory

This document summarizes steps for performing logistic regression on a heart disease dataset to predict whether patients have heart disease or not. It introduces logistic regression as a binary classification algorithm. It then loads and preprocesses the Cleveland heart disease dataset, including converting class labels to binary, imputing missing values, and separating predictor and target variables. Features are also visualized with histograms.

Uploaded by

Meer Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views15 pages

Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory

Uploaded by

Meer Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

ML practice-Week 6:

Logistic Regression for binary classification

Introduction

Logistic Regression (also called Logit Regression) is commonly used to estimate the probability
that an instance belongs to a particular class.

If the estimated probability is

greater than 50%, then the model predicts that the instance
belongs to that class
(called the positive class, labeled “1”), or else it predicts that it does
not (i.e., it
belongs to the negative class, labeled “0”).
This makes it a binary classifier.

Loading the dataset

Cleveland Heart-disease dataset

Attribute Information:

1. Age (in years)

2. Sex (1 = male; 0 = female)
3. cp -chest pain type
4. trestbps - resting blood pressure (anything above 130-140 is typically cause for concern)
5. chol-serum cholestoral in mg/dl (above 200 is cause for concern)
6. fbs - fasting blood sugar ( > 120 mg/dl) (1 = true; 0 = false)
7. restecg - resting electrocardiographic results (0 = normal;1 = having ST-T wave
abnormality; 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria)
8. thalach-maximum heart rate achieved
9. exang - exercise induced angina (1 = yes; 0 = no)
10. oldpeak - depression induced by exercise relative to rest
11. slope - slope of the peak exercise ST segment (1 = upsloping; 2 = flat Value; 3 =
downsloping)
12. ca - number of major vessels (0-3) colored by flourosopy
13. thal - (3 = normal; 6 = fixed defect; 7 = reversable defect
14. num (target) - diagnosis of heart disease (angiographic disease status)( 0: < 50% diameter
narrowing ; 1: > 50% diameter narrowing)

Reference

Import basic libraries

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import numpy as np

from sklearn import datasets

from sklearn.linear_model import LogisticRegression

from pandas.plotting import scatter_matrix

from sklearn.impute import SimpleImputer

from sklearn.preprocessing import StandardScaler

Download Heart disease dataset

#Define the column names
cols = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'ol

# Load the dataset
heart_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/heart-
heart_data

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope c

0 63.0 1.0 1.0 145.0 233.0 1.0 2.0 150.0 0.0 2.3 3.0 0.

1 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5 2.0 3.

2 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2.0 2.

3 37.0 1.0 3.0 130.0 250.0 0.0 0.0 187.0 0.0 3.5 3.0 0.

4 41.0 0.0 2.0 130.0 204.0 0.0 2.0 172.0 0.0 1.4 1.0 0.

... ... ... ... ... ... ... ... ... ... ... ... .

298 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.

299 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.

300 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.

301 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.

302 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0

303 rows × 14 columns

# to check the type of data variable

type(heart_data)

pandas.core.frame.DataFrame

# Display first five rows of the dataset

heart_data.head() #head is first 5 rows

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca

0 63.0 1.0 1.0 145.0 233.0 1.0 2.0 150.0 0.0 2.3 3.0 0.0

1 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5 2.0 3.0

2 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2.0 2.0

3 37.0 1.0 3.0 130.0 250.0 0.0 0.0 187.0 0.0 3.5 3.0 0.0

4 41.0 0.0 2.0 130.0 204.0 0.0 2.0 172.0 0.0 1.4 1.0 0.0

# Display last five rows of the dataset

heart_data.tail() #tail is last 5 rows

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope c

298 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.

299 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.

300 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.

301 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.

302 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0

Visualizing dataset and features

for feature in cols:

plt.hist(heart_data[feature])

plt.title(feature)

# display histogram

plt.show()

Preprocessing : class labels

Experiments with the database have concentrated on simply attempting to distinguish
presence (values 1,2,3,4) from absence (value 0)
Let us change instances with labels 2,3 and to 1.

# converting class labels 2,3, and 4 into label 1

heart_data = heart_data.replace({"num": {2:1,3:1, 4:1}})

# Visualize the label

heart_data

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope c

0 63.0 1.0 1.0 145.0 233.0 1.0 2.0 150.0 0.0 2.3 3.0 0.

1 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5 2.0 3.

2 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2.0 2.

3 37.0 1.0 3.0 130.0 250.0 0.0 0.0 187.0 0.0 3.5 3.0 0.

4 41.0 0.0 2.0 130.0 204.0 0.0 2.0 172.0 0.0 1.4 1.0 0.

... ... ... ... ... ... ... ... ... ... ... ... .

298 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.

299 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.

300 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.

301 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.

302 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0

303 rows × 14 columns

Preprocessing : Replacing missing values

The feature 'ca' has missing values that are given as '?'. Let us replace the '?' with nan and then
fill those missing values using 'mean' imputation strategy.
heart_data.replace('?',np.nan, inplace=True)

imputer = SimpleImputer(missing_values = np.nan, strategy ='mean')

imputer = imputer.fit(heart_data)
heart_imputed = imputer.transform(heart_data)

heart_data_imputed = pd.DataFrame(heart_imputed, columns = cols)

heart_data_imputed

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope

0 63.0 1.0 1.0 145.0 233.0 1.0 2.0 150.0 0.0 2.3 3.0 0.

1 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5 2.0 3.

2 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2.0 2.

3 37.0 1.0 3.0 130.0 250.0 0.0 0.0 187.0 0.0 3.5 3.0 0.

4 41.0 0.0 2.0 130.0 204.0 0.0 2.0 172.0 0.0 1.4 1.0 0.

... ... ... ... ... ... ... ... ... ... ... ...

298 45.0 1.0 1.0 110.0 264.0 0.0 0.0 132.0 0.0 1.2 2.0 0.

299 68.0 1.0 4.0 144.0 193.0 1.0 0.0 141.0 0.0 3.4 2.0 2.

300 57.0 1.0 4.0 130.0 131.0 0.0 0.0 115.0 1.0 1.2 2.0 1.

301 57.0 0.0 2.0 130.0 236.0 0.0 2.0 174.0 0.0 0.0 2.0 1.

302 38.0 1.0 3.0 138.0 175.0 0.0 0.0 173.0 0.0 0.0 1.0 0.

303 rows × 14 columns

Let us first separate the input attributes and target attribute.

# Assign a new variable y as target

y = heart_data_imputed['num']

y = np.array(y)

array([0., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0., 1.,

0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0.,

0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0.,

0., 1., 0., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 1., 1., 1., 0.,

1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0.,

0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0.,

0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 1.,

1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.,

1., 1., 1., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0.,

1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0.,

1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1., 0., 0., 1., 0., 0.,

1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0.,

0., 1., 1., 1., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0.,

0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 1., 1.,

0., 0., 0., 0., 0., 1., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0.,

0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 0.,

1., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 0., 0.,

0., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0.])

# Remove the target variable from heart_data

del heart_data_imputed['num']

heart_data_imputed.describe()

age sex cp trestbps chol fbs reste

count 303.000000 303.000000 303.000000 303.000000 303.000000 303.000000 303.0000

mean 54.438944 0.679868 3.158416 131.689769 246.693069 0.148515 0.9900

std 9.038662 0.467299 0.960126 17.599748 51.776918 0.356198 0.9949

min 29.000000 0.000000 1.000000 94.000000 126.000000 0.000000 0.0000

25% 48.000000 0.000000 3.000000 120.000000 211.000000 0.000000 0.0000

50% 56.000000 1.000000 3.000000 130.000000 241.000000 0.000000 1.0000

75% 61.000000 1.000000 4.000000 140.000000 275.000000 0.000000 2.0000

max 77.000000 1.000000 4.000000 200.000000 564.000000 1.000000 2.0000

Understanding the correlation between Input features

plt.figure(figsize=[15,15])

sns.heatmap(heart_data_imputed.corr(),annot = True, square = True)

plt.show()

Train and Test data split

# Let us split the data for training and testing

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(heart_data_imputed, y, test_size = 0.2

print('Shape of training data',X_train.shape)

print('Shape of training labels', y_train.shape)

print('Shape of testing data', X_test.shape)

print('Shape of testing labels',y_test.shape)

Shape of training data (227, 13)

Shape of training labels (227,)

Shape of testing data (76, 13)

Shape of testing labels (76,)

X_train

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope

173 62.0 0.0 4.0 140.0 394.0 0.0 2.0 157.0 0.0 1.2 2.0 0.

261 58.0 0.0 2.0 136.0 319.0 1.0 2.0 152.0 0.0 0.0 1.0 2.

37 57.0 1.0 4.0 150.0 276.0 0.0 2.0 112.0 1.0 0.6 2.0 1.

101 34.0 1.0 1.0 118.0 182.0 0.0 2.0 174.0 0.0 0.0 1.0 0.

166 52.0 1.0 3.0 138.0 223.0 0.0 0.0 169.0 0.0 0.0 1.0 0.

... ... ... ... ... ... ... ... ... ... ... ...

251 58.0 1.0 4.0 146.0 218.0 0.0 0.0 105.0 0.0 2.0 2.0 1.

192 43.0 1.0 4.0 132.0 247.0 1.0 2.0 143.0 1.0 0.1 2.0 0.

117 35.0 0.0 4.0 138.0 183.0 0.0 0.0 182.0 0.0 1.4 1.0 0.

47 50.0 1.0 4.0 150.0 243.0 0.0 2.0 128.0 0.0 2.6 2.0 0.

172 59.0 0.0 4.0 174.0 249.0 0.0 0.0 143.0 1.0 0.0 2.0 0.

As there
227 is a wide
rows × 13 variation
columns among the numerical values between features, it is a best practice
to normalize the features before training.

Normalizing features for training

# Instantiate the scaler to a variable and fit the train and test data

ss = StandardScaler()

X_train_norm = ss.fit_transform(X_train)

X_test_norm = ss.transform(X_test)

Perform Classification

LR = LogisticRegression()

classifier=LR.fit(X_train_norm, y_train)

score = LR.score(X_train_norm, y_train)

print("Training score: ", score)

#Make the prediction

y_pred = LR.predict(X_test_norm)

Training score: 0.8634361233480177

# Import the libraries

from sklearn.metrics import confusion_matrix

from sklearn.metrics import classification_report

Confusion Matrix

A confusion matrix is a summary of prediction results on a classification problem.

Source

# visualizing the confusion matrix

from sklearn.metrics import plot_confusion_matrix

from sklearn.metrics import plot_roc_curve

class_names=["0","1"]

plot_confusion_matrix(classifier, X_test_norm, y_test,display_labels=class_names,cmap=plt.
tick_marks = np.arange(len(class_names))

plt.xticks(tick_marks, class_names, rotation=0)

plt.yticks(tick_marks, class_names)

plt.title('Confusion matrix')

plt.show()

/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarnin
warnings.warn(msg, category=FutureWarning)

CR = classification_report(y_test, y_pred)

print('Classification report \n')

print(CR)

Classification report

precision recall f1-score support

0.0 0.78 0.90 0.84 40

1.0 0.87 0.72 0.79 36

accuracy 0.82 76

macro avg 0.82 0.81 0.81 76

weighted avg 0.82 0.82 0.81 76

Hyperparameter tuning with RandomizedSearchCV and GridSearchCV

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

RandomizedSearchCV

# Create a hyperparameter grid for LogisticRegression

log_reg_grid_rs = {"C": np.logspace(-4, 4, 20),

"solver": ["liblinear"]}

# Tune LogisticRegression

np.random.seed(42)

# Setup random hypterparameter search for LogisticRegression with cross-validation

RS_log_reg = RandomizedSearchCV(LogisticRegression(),

param_distributions=log_reg_grid_rs,

cv=5,

n_iter=20,

verbose=True)

# Fit random hyperparamter search model for LogisticRegression

RS_log_reg.fit(X_train_norm, y_train)

Fitting 5 folds for each of 20 candidates, totalling 100 fits

RandomizedSearchCV(cv=5, estimator=LogisticRegression(), n_iter=20,

param_distributions={'C': array([1.00000000e-04, 2.63665090e-04,

6.95192796e-04, 1.83298071e-03,

4.83293024e-03, 1.27427499e-02, 3.35981829e-02, 8.85866790e-02,

2.33572147e-01, 6.15848211e-01, 1.62377674e+00, 4.28133240e+00,

1.12883789e+01, 2.97635144e+01, 7.84759970e+01, 2.06913808e+02,

5.45559478e+02, 1.43844989e+03, 3.79269019e+03, 1.00000000e+04]),

'solver': ['liblinear']},

verbose=True)

?LogisticRegression()

np.logspace(-4, 4, 20)

array([1.00000000e-04, 2.63665090e-04, 6.95192796e-04, 1.83298071e-03,

4.83293024e-03, 1.27427499e-02, 3.35981829e-02, 8.85866790e-02,

2.33572147e-01, 6.15848211e-01, 1.62377674e+00, 4.28133240e+00,

1.12883789e+01, 2.97635144e+01, 7.84759970e+01, 2.06913808e+02,

5.45559478e+02, 1.43844989e+03, 3.79269019e+03, 1.00000000e+04])

# Find the best hyperparameters

RS_log_reg.best_params_

{'C': 0.08858667904100823, 'solver': 'liblinear'}

RS_log_reg.score(X_train_norm, y_train)

0.8678414096916299

# Make predictions with tuned model

y_preds = RS_log_reg.predict(X_test_norm)

# Confusion matrix

print(confusion_matrix(y_test, y_preds))

[[36 4]

[10 26]]

?plot_roc_curve

# Plot ROC curve and calculate AUC metric

plot_roc_curve(RS_log_reg, X_test_norm, y_test)

/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarnin
warnings.warn(msg, category=FutureWarning)

<sklearn.metrics._plot.roc_curve.RocCurveDisplay at 0x7f8a13f6d190>

GridSearchCV

# Setup grid hyperparamter search for LogisticRegression

log_reg_grid_gs = {"C": np.logspace(-4, 4, 30),

"solver": ["liblinear"]}

GS_log_reg = GridSearchCV(LogisticRegression(),

param_grid=log_reg_grid_gs,

cv=5,

verbose=True)

# Fit grid hyperparameter search model

GS_log_reg.fit(X_train_norm, y_train);

Fitting 5 folds for each of 30 candidates, totalling 150 fits

np.logspace(-4, 4, 30)

array([1.00000000e-04, 1.88739182e-04, 3.56224789e-04, 6.72335754e-04,

1.26896100e-03, 2.39502662e-03, 4.52035366e-03, 8.53167852e-03,

1.61026203e-02, 3.03919538e-02, 5.73615251e-02, 1.08263673e-01,

2.04335972e-01, 3.85662042e-01, 7.27895384e-01, 1.37382380e+00,

2.59294380e+00, 4.89390092e+00, 9.23670857e+00, 1.74332882e+01,

3.29034456e+01, 6.21016942e+01, 1.17210230e+02, 2.21221629e+02,

4.17531894e+02, 7.88046282e+02, 1.48735211e+03, 2.80721620e+03,

5.29831691e+03, 1.00000000e+04])

# Check the best hyperparameters

GS_log_reg.best_params_

{'C': 0.1082636733874054, 'solver': 'liblinear'}

# Evaluate the grid search LogisticRegression model

GS_log_reg.score(X_train_norm, y_train)

0.8634361233480177

# Make predictions with tuned model

y_preds = GS_log_reg.predict(X_test_norm)

# Confusion matrix

print(confusion_matrix(y_test, y_preds))

[[36 4]

[10 26]]

FREE ENERGY Tesla Secrets For Everybody
100% (9)
FREE ENERGY Tesla Secrets For Everybody
58 pages
Semi Detailed Lesson Plan
No ratings yet
Semi Detailed Lesson Plan
4 pages
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
No ratings yet
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
24 pages
Village Map: Taluka: Kaij District: Bid
100% (1)
Village Map: Taluka: Kaij District: Bid
1 page
XI - BST - 3 - Private, Public and Global Enterprises
No ratings yet
XI - BST - 3 - Private, Public and Global Enterprises
3 pages
MLT Lab 07
No ratings yet
MLT Lab 07
4 pages
Final PPT Heart Disease
67% (3)
Final PPT Heart Disease
23 pages
ASSIGNMENT II - Logistic Regression (Sukanya Das - 221001001006)
No ratings yet
ASSIGNMENT II - Logistic Regression (Sukanya Das - 221001001006)
10 pages
From Vivaldi To Viotti - A History of The Early Classical - White, Chappell - 2. Print, Philadelphia, 1992 - Philadelphia - Gordon and Breach - 97828812449
No ratings yet
From Vivaldi To Viotti - A History of The Early Classical - White, Chappell - 2. Print, Philadelphia, 1992 - Philadelphia - Gordon and Breach - 97828812449
416 pages
Samplecode (HDPS)
No ratings yet
Samplecode (HDPS)
29 pages
Marketing Assignment - Final
No ratings yet
Marketing Assignment - Final
14 pages
Chest Freezer: User Manual
No ratings yet
Chest Freezer: User Manual
31 pages
Lab Task - 13 - 224g1a0528
No ratings yet
Lab Task - 13 - 224g1a0528
3 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Stroke Prediction
No ratings yet
Stroke Prediction
14 pages
The Famished Road
No ratings yet
The Famished Road
91 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Heart Disease Report With Comments and Code
No ratings yet
Heart Disease Report With Comments and Code
9 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Project Report
No ratings yet
Project Report
18 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Chapter Three Searching and Sorting Algorithm
100% (1)
Chapter Three Searching and Sorting Algorithm
47 pages
AIML Practical 05 22105A2021
No ratings yet
AIML Practical 05 22105A2021
9 pages
ML Practicals
No ratings yet
ML Practicals
21 pages
Assignment 1 Pinnacle's E-Library: Team Members
100% (1)
Assignment 1 Pinnacle's E-Library: Team Members
27 pages
Ai in HC - 2
No ratings yet
Ai in HC - 2
9 pages
Heart - Cleveland - Ipynb - Colab
No ratings yet
Heart - Cleveland - Ipynb - Colab
5 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Heart Disease Report
No ratings yet
Heart Disease Report
8 pages
Inbound 3085046103164618170
No ratings yet
Inbound 3085046103164618170
2 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
AI Mini Project
No ratings yet
AI Mini Project
6 pages
Import Numpy As NP
No ratings yet
Import Numpy As NP
3 pages
ML Lab Program - VTU
No ratings yet
ML Lab Program - VTU
5 pages
IR Final LabManual
No ratings yet
IR Final LabManual
18 pages
The Genius Guide To - Divine Archetypes
100% (1)
The Genius Guide To - Divine Archetypes
18 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Medical Bayesian Network Analysis
No ratings yet
Medical Bayesian Network Analysis
8 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
CSEC Biology June 2014 P032
No ratings yet
CSEC Biology June 2014 P032
12 pages
Case Study
No ratings yet
Case Study
21 pages
Assignment 2nd October
No ratings yet
Assignment 2nd October
1 page
Zhang 2021 J. Phys. Conf. Ser. 1769 012024
No ratings yet
Zhang 2021 J. Phys. Conf. Ser. 1769 012024
6 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
Python Cod1
No ratings yet
Python Cod1
3 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Logistic Regression for Heart Disease
No ratings yet
Logistic Regression for Heart Disease
8 pages
Lab Program 7
No ratings yet
Lab Program 7
5 pages
The Meydan Hotel Quotation
No ratings yet
The Meydan Hotel Quotation
2 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
Final Cc01 Group05-1
No ratings yet
Final Cc01 Group05-1
26 pages
Consolidated TPM Configuration BI-BPS
100% (2)
Consolidated TPM Configuration BI-BPS
111 pages
Heart Disease Classification ML Assignment - Jupyter Notebook
No ratings yet
Heart Disease Classification ML Assignment - Jupyter Notebook
7 pages
Heart Disease Prediction Model
No ratings yet
Heart Disease Prediction Model
25 pages
Mysterious Loan Request at Bank
No ratings yet
Mysterious Loan Request at Bank
28 pages
Shivam
No ratings yet
Shivam
43 pages
Vernalisation in Details
No ratings yet
Vernalisation in Details
3 pages
NF/NFOM Panelboards Tableros de Alumbrado y Distribución NF y Nfom Panneaux de Distribution NF/NFOM
No ratings yet
NF/NFOM Panelboards Tableros de Alumbrado y Distribución NF y Nfom Panneaux de Distribution NF/NFOM
116 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
BMIT3713 A S: Dvanced Preadsheets
No ratings yet
BMIT3713 A S: Dvanced Preadsheets
218 pages
Sapura Group Training Insights
No ratings yet
Sapura Group Training Insights
11 pages
Unit 3: Management Information System System Analysis Concept
100% (1)
Unit 3: Management Information System System Analysis Concept
3 pages
Company Profile - Intertexz Trims Professional
No ratings yet
Company Profile - Intertexz Trims Professional
15 pages
Diabetic Retinopathy Risk Modeling
No ratings yet
Diabetic Retinopathy Risk Modeling
24 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
DAO2702 - Sample Exam
No ratings yet
DAO2702 - Sample Exam
12 pages
Thuyết Trình Anh Văn Sáng Thứ 5
No ratings yet
Thuyết Trình Anh Văn Sáng Thứ 5
7 pages
Opm 530-Article Review
No ratings yet
Opm 530-Article Review
6 pages
Forcasting
No ratings yet
Forcasting
85 pages
The Future of Automotive Manufacturing - Integrating AI... For Next-Gen Automatic Cars
No ratings yet
The Future of Automotive Manufacturing - Integrating AI... For Next-Gen Automatic Cars
9 pages
Chapter 2 Organizing Data
No ratings yet
Chapter 2 Organizing Data
18 pages
Project Proposal ITS232 (New)
No ratings yet
Project Proposal ITS232 (New)
7 pages
Toxicity Testing: Effects of Chemicals On Mung Bean Vigna Radianta Seed Germination
No ratings yet
Toxicity Testing: Effects of Chemicals On Mung Bean Vigna Radianta Seed Germination
12 pages
Environmental Health and Ecological Risk
No ratings yet
Environmental Health and Ecological Risk
19 pages
Pricing of Services: Presented By: Himanshu Gupta Sashank.V.V.N Vipul Srivastava
No ratings yet
Pricing of Services: Presented By: Himanshu Gupta Sashank.V.V.N Vipul Srivastava
21 pages
Natural Beekeeping Company Profile
No ratings yet
Natural Beekeeping Company Profile
48 pages
Haldi Ram
No ratings yet
Haldi Ram
9 pages
Total 207 212 27 51 Grand Total
No ratings yet
Total 207 212 27 51 Grand Total
20 pages
French SAT Subject Test
No ratings yet
French SAT Subject Test
1 page
Factors Led To The Growth of MIS
No ratings yet
Factors Led To The Growth of MIS
17 pages
Turbo Machinery Exam Results 2019
No ratings yet
Turbo Machinery Exam Results 2019
3 pages
An Islamic Perspective On Public Adminis
No ratings yet
An Islamic Perspective On Public Adminis
4 pages
POD23S2C21890053
No ratings yet
POD23S2C21890053
2 pages
ESF Profile
No ratings yet
ESF Profile
12 pages
Dplyr
No ratings yet
Dplyr
86 pages
Hull For: Aerodynamic Design HASPA LTA Optimization
No ratings yet
Hull For: Aerodynamic Design HASPA LTA Optimization
5 pages
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
No ratings yet
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
59 pages
AWS-SOP - Creating ALB and Configuring Target Groups, Listeners and Stickiness
No ratings yet
AWS-SOP - Creating ALB and Configuring Target Groups, Listeners and Stickiness
15 pages
Week-7 (SWI)
No ratings yet
Week-7 (SWI)
19 pages
MLP - Week 6 - MNIST - LogitReg - Ipynb - Colaboratory
No ratings yet
MLP - Week 6 - MNIST - LogitReg - Ipynb - Colaboratory
19 pages
Tell Me About Yourself
No ratings yet
Tell Me About Yourself
8 pages
Bulging As A Pile Imperfection
No ratings yet
Bulging As A Pile Imperfection
5 pages
MLP Week 6 NaiveBayesImplementation - Ipynb - Colaboratory
No ratings yet
MLP Week 6 NaiveBayesImplementation - Ipynb - Colaboratory
5 pages
Critical Thinking Exercise: "Wild Child: The Story of Feral Children"
No ratings yet
Critical Thinking Exercise: "Wild Child: The Story of Feral Children"
2 pages
RAJA
No ratings yet
RAJA
1 page
Product Launch & Retail Dynamics
No ratings yet
Product Launch & Retail Dynamics
4 pages
Portfolio Beekeeping
No ratings yet
Portfolio Beekeeping
6 pages

Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory

Uploaded by

Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory

Uploaded by

ML practice-Week 6:

Logistic Regression for binary classification

If the estimated probability is

Loading the dataset

Cleveland Heart-disease dataset

1. Age (in years)

Import basic libraries

Download Heart disease dataset

303 rows × 14 columns

Visualizing dataset and features

Preprocessing : class labels

303 rows × 14 columns

Preprocessing : Replacing missing values

303 rows × 14 columns

Let us first separate the input attributes and target attribute.

age sex cp trestbps chol fbs reste

count 303.000000 303.000000 303.000000 303.000000 303.000000 303.000000 303.0000

mean 54.438944 0.679868 3.158416 131.689769 246.693069 0.148515 0.9900

std 9.038662 0.467299 0.960126 17.599748 51.776918 0.356198 0.9949

min 29.000000 0.000000 1.000000 94.000000 126.000000 0.000000 0.0000

25% 48.000000 0.000000 3.000000 120.000000 211.000000 0.000000 0.0000

50% 56.000000 1.000000 3.000000 130.000000 241.000000 0.000000 1.0000

75% 61.000000 1.000000 4.000000 140.000000 275.000000 0.000000 2.0000

max 77.000000 1.000000 4.000000 200.000000 564.000000 1.000000 2.0000

Understanding the correlation between Input features

Train and Test data split

Shape of training data (227, 13)

Shape of training labels (227,)

Shape of testing data (76, 13)

Shape of testing labels (76,)

Normalizing features for training

Training score: 0.8634361233480177

A confusion matrix is a summary of prediction results on a classification problem.

precision recall f1-score support

0.0 0.78 0.90 0.84 40

1.0 0.87 0.72 0.79 36

macro avg 0.82 0.81 0.81 76

weighted avg 0.82 0.82 0.81 76

Hyperparameter tuning with RandomizedSearchCV and GridSearchCV

Fitting 5 folds for each of 20 candidates, totalling 100 fits

RandomizedSearchCV(cv=5, estimator=LogisticRegression(), n_iter=20,

param_distributions={'C': array([1.00000000e-04, 2.63665090e-04,

4.83293024e-03, 1.27427499e-02, 3.35981829e-02, 8.85866790e-02,

2.33572147e-01, 6.15848211e-01, 1.62377674e+00, 4.28133240e+00,

1.12883789e+01, 2.97635144e+01, 7.84759970e+01, 2.06913808e+02,

5.45559478e+02, 1.43844989e+03, 3.79269019e+03, 1.00000000e+04]),

array([1.00000000e-04, 2.63665090e-04, 6.95192796e-04, 1.83298071e-03,

4.83293024e-03, 1.27427499e-02, 3.35981829e-02, 8.85866790e-02,

2.33572147e-01, 6.15848211e-01, 1.62377674e+00, 4.28133240e+00,

1.12883789e+01, 2.97635144e+01, 7.84759970e+01, 2.06913808e+02,

5.45559478e+02, 1.43844989e+03, 3.79269019e+03, 1.00000000e+04])

{'C': 0.08858667904100823, 'solver': 'liblinear'}

Fitting 5 folds for each of 30 candidates, totalling 150 fits

array([1.00000000e-04, 1.88739182e-04, 3.56224789e-04, 6.72335754e-04,

1.26896100e-03, 2.39502662e-03, 4.52035366e-03, 8.53167852e-03,

1.61026203e-02, 3.03919538e-02, 5.73615251e-02, 1.08263673e-01,

2.04335972e-01, 3.85662042e-01, 7.27895384e-01, 1.37382380e+00,

2.59294380e+00, 4.89390092e+00, 9.23670857e+00, 1.74332882e+01,

3.29034456e+01, 6.21016942e+01, 1.17210230e+02, 2.21221629e+02,

4.17531894e+02, 7.88046282e+02, 1.48735211e+03, 2.80721620e+03,

{'C': 0.1082636733874054, 'solver': 'liblinear'}

You might also like