0% found this document useful (0 votes)

47 views12 pages

Logistic Regression

This document explores and analyzes a cardiovascular disease dataset using Python. It loads the dataset, inspects the data types and distribution of variables, visualizes correlations between variables using heatmaps, and examines relationships between variables through count plots and box plots. Null values in some columns are filled in using mean imputation or values based on other variables. The goal is to clean the data and gain insights through exploratory data analysis before building predictive models.

Uploaded by

Kagade Ajinkya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views12 pages

Logistic Regression

Uploaded by

Kagade Ajinkya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

In

[1]: # importing Libraries

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [2]: # import dataset

df = pd.read_csv('framingham.csv')

Data Inspection
In [3]: df.head()

Out[3]: male age education currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp diabetes totChol

0 1 39 4.0 0 0.0 0.0 0 0 0 195.0

1 0 46 2.0 0 0.0 0.0 0 0 0 250.0

2 1 48 1.0 1 20.0 0.0 0 0 0 245.0

3 0 61 3.0 1 30.0 0.0 0 1 0 225.0

4 0 46 3.0 1 23.0 0.0 0 0 0 285.0

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4238 entries, 0 to 4237
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 male 4238 non-null int64
1 age 4238 non-null int64
2 education 4133 non-null float64
3 currentSmoker 4238 non-null int64
4 cigsPerDay 4209 non-null float64
5 BPMeds 4185 non-null float64
6 prevalentStroke 4238 non-null int64
7 prevalentHyp 4238 non-null int64
8 diabetes 4238 non-null int64
9 totChol 4188 non-null float64
10 sysBP 4238 non-null float64
11 diaBP 4238 non-null float64
12 BMI 4219 non-null float64
13 heartRate 4237 non-null float64
14 glucose 3850 non-null float64
15 TenYearCHD 4238 non-null int64
dtypes: float64(9), int64(7)
memory usage: 529.9 KB

In [5]: df.describe()

Loading [MathJax]/extensions/Safe.js
Out[5]: male age education currentSmoker cigsPerDay BPMeds prevalentStroke preva

count 4238.000000 4238.000000 4133.000000 4238.000000 4209.000000 4185.000000 4238.000000 4238

mean 0.429212 49.584946 1.978950 0.494101 9.003089 0.029630 0.005899 0

std 0.495022 8.572160 1.019791 0.500024 11.920094 0.169584 0.076587 0

min 0.000000 32.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0

25% 0.000000 42.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0

50% 0.000000 49.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0

75% 1.000000 56.000000 3.000000 1.000000 20.000000 0.000000 0.000000 1

max 1.000000 70.000000 4.000000 1.000000 70.000000 1.000000 1.000000 1

In [6]: df.isnull().sum()

male 0
Out[6]:
age 0
education 105
currentSmoker 0
cigsPerDay 29
BPMeds 53
prevalentStroke 0
prevalentHyp 0
diabetes 0
totChol 50
sysBP 0
diaBP 0
BMI 19
heartRate 1
glucose 388
TenYearCHD 0
dtype: int64

Exploratory Data Analysis (EDA)

In [7]: plt.figure(figsize =(15,10))
sns.heatmap(df,yticklabels=False,cbar=False,cmap='viridis')

<AxesSubplot:>
Out[7]:

Loading [MathJax]/extensions/Safe.js
In [8]: plt.figure(figsize=(15,15))
sns.heatmap(df.corr(),annot=True)

<AxesSubplot:>
Out[8]:

Loading [MathJax]/extensions/Safe.js
In [9]: df['prevalentStroke'].value_counts()

0 4213
Out[9]:
1 25
Name: prevalentStroke, dtype: int64

In [10]: sns.set_style('whitegrid')
sns.countplot(x='TenYearCHD',data=df)

<AxesSubplot:xlabel='TenYearCHD', ylabel='count'>
Out[10]:

Loading [MathJax]/extensions/Safe.js
In [ ]:

In [11]: sns.set_style('whitegrid')
sns.countplot(x='TenYearCHD',hue='male',data=df)

<AxesSubplot:xlabel='TenYearCHD', ylabel='count'>
Out[11]:

In [12]: sns.set_style('whitegrid')
sns.countplot(x='TenYearCHD',hue='diabetes',data=df)

<AxesSubplot:xlabel='TenYearCHD', ylabel='count'>
Out[12]:

In [13]: df['education'].value_counts()
Loading [MathJax]/extensions/Safe.js
1.0 1720
Out[13]:
2.0 1253
3.0 687
4.0 473
Name: education, dtype: int64

In [14]: sns.countplot(x= 'education', data = df)

<AxesSubplot:xlabel='education', ylabel='count'>
Out[14]:

In [15]: sns.countplot(x= 'prevalentStroke', data = df)

<AxesSubplot:xlabel='prevalentStroke', ylabel='count'>
Out[15]:

In [16]: sns.distplot(df['totChol'].dropna())

C:\Users\Admin\AppData\Local\Temp\ipykernel_20640\2569219756.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(df['totChol'].dropna())
<AxesSubplot:xlabel='totChol', ylabel='Density'>
Out[16]:

Loading [MathJax]/extensions/Safe.js
In [17]: sns.distplot(df['glucose'].dropna())

C:\Users\Admin\AppData\Local\Temp\ipykernel_20640\2989925030.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(df['glucose'].dropna())
<AxesSubplot:xlabel='glucose', ylabel='Density'>
Out[17]:

In [18]: plt.figure(figsize=(12,7))
sns.boxplot(y='glucose',x='diabetes',data=df)

<AxesSubplot:xlabel='diabetes', ylabel='glucose'>
Out[18]:

Loading [MathJax]/extensions/Safe.js
In [19]: sns.boxplot(x = 'education',y= 'age',data = df)

<AxesSubplot:xlabel='education', ylabel='age'>
Out[19]:

In [20]: sns.set_style('whitegrid')
sns.countplot(x='prevalentStroke',hue='BPMeds',data=df)

<AxesSubplot:xlabel='prevalentStroke', ylabel='count'>
Out[20]:

Loading [MathJax]/extensions/Safe.js
In [21]: sns.boxplot(x = 'currentSmoker',y= 'cigsPerDay',data = df)

<AxesSubplot:xlabel='currentSmoker', ylabel='cigsPerDay'>
Out[21]:

Filling Null Values

In [22]: def input_glucose(cols):
diabetes=cols[0]
glucose =cols[1]

if pd.isnull(glucose):
if diabetes==0:
return 75
if diabetes==1:
return 149
else:
return glucose

In [23]: df['glucose']=df[['diabetes','glucose']].apply(input_glucose,axis=1)

In [24]: df['heartRate']=df['heartRate'].fillna(df['heartRate'].mean())

In [25]: df['BMI']=df['BMI'].fillna(df['BMI'].mean())

Loading [MathJax]/extensions/Safe.js
In [26]: df['totChol']=df['totChol'].fillna(df['totChol'].mean())

In [27]: df['BPMeds']=df['BPMeds'].fillna(1)

In [28]: cigs_mean=df['cigsPerDay'].mean()

In [29]: def fill_cigsPerDay(cols):

smoker=cols[0]
cigsPerDay=cols[1]
if pd.isnull(smoker):
return 0
else :
return cigs_mean

In [30]: df['cigsPerDay']=df[['currentSmoker','cigsPerDay']].apply(input_glucose,axis=1)

In [31]: def fill_education(cols):

age=cols[0]
education=cols[1]
if pd.isnull(education):
if age>53:
return 1
if age<46:
return 2
else:
return 3

else :
return education

In [32]: df['education']=df[['age','education']].apply(fill_education,axis=1)

In [33]: plt.figure(figsize =(15,10))

sns.heatmap(df,yticklabels=False,cbar=False,cmap='viridis')

<AxesSubplot:>
Out[33]:

Loading [MathJax]/extensions/Safe.js
In [34]: df.isnull().sum()

male 0
Out[34]:
age 0
education 0
currentSmoker 0
cigsPerDay 0
BPMeds 0
prevalentStroke 0
prevalentHyp 0
diabetes 0
totChol 0
sysBP 0
diaBP 0
BMI 0
heartRate 0
glucose 0
TenYearCHD 0
dtype: int64

Data Preparation
In [35]: list1=list(df.columns)
list1.remove('TenYearCHD')

In [36]: X = df[list1]
y=df['TenYearCHD']

Loading [MathJax]/extensions/Safe.js
In [37]: from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [38]: y_train.value_counts(normalize=True)

0 0.846608
Out[38]:
1 0.153392
Name: TenYearCHD, dtype: float64

Implementing Logistic Regression

In [39]: from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

In [40]: model.fit(X_train,y_train)

C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\linear_
model\_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Out[40]: ▾ LogisticRegression

LogisticRegression()

In [41]: model.score(X_train,y_train)

0.848377581120944
Out[41]:

In [42]: y_pred = model.predict(X_test)

Accuracy
In [43]: from sklearn.metrics import confusion_matrix,accuracy_score

In [44]: accuracy = confusion_matrix(y_test,y_pred)

In [45]: accuracy

array([[719, 5],
Out[45]:
[117, 7]], dtype=int64)

In [46]: accuracy = accuracy_score(y_test,y_pred)

In [47]: accuracy

0.8561320754716981
Out[47]:

In [ ]:

Loading [MathJax]/extensions/Safe.js

Group5 Siteselection
No ratings yet
Group5 Siteselection
29 pages
Fault Code List For Base Module (GM) Control Unit 2
No ratings yet
Fault Code List For Base Module (GM) Control Unit 2
4 pages
Bored Cast-In Situ Piles
100% (1)
Bored Cast-In Situ Piles
7 pages
IR-ADV C3530 C3525 C3520 III Series Partscatalog E EUR
No ratings yet
IR-ADV C3530 C3525 C3520 III Series Partscatalog E EUR
138 pages
Consolidated Marksheet
No ratings yet
Consolidated Marksheet
3 pages
Heart Attack Prediction Model EDA
100% (1)
Heart Attack Prediction Model EDA
24 pages
Data Perparation Penting
No ratings yet
Data Perparation Penting
12 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
16 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
ASSIGNMENT II - Logistic Regression (Sukanya Das - 221001001006)
No ratings yet
ASSIGNMENT II - Logistic Regression (Sukanya Das - 221001001006)
10 pages
6034 Logistic Regression
No ratings yet
6034 Logistic Regression
6 pages
Health Risk Prediction
No ratings yet
Health Risk Prediction
80 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Heart Attack
No ratings yet
Heart Attack
18 pages
Data Mining Lab - Ipynb - Colab
No ratings yet
Data Mining Lab - Ipynb - Colab
7 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Astm D7234-12 (Adhesion Strength of Coatings On Concrete)
No ratings yet
Astm D7234-12 (Adhesion Strength of Coatings On Concrete)
9 pages
Stroke Prediction
No ratings yet
Stroke Prediction
14 pages
Pima Indians Diabetes Patient Classification
No ratings yet
Pima Indians Diabetes Patient Classification
22 pages
Mock Part1.ipynb - Colab
No ratings yet
Mock Part1.ipynb - Colab
10 pages
SP 3 D Upgrade Guide
No ratings yet
SP 3 D Upgrade Guide
37 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
Week-01 B
No ratings yet
Week-01 B
4 pages
Data Set Preperation
No ratings yet
Data Set Preperation
7 pages
Vedant, Aiml
No ratings yet
Vedant, Aiml
63 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
Logistic Regression for Heart Disease
No ratings yet
Logistic Regression for Heart Disease
8 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
ML Proj Diabetes
No ratings yet
ML Proj Diabetes
51 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
Wireless World 1983 03
No ratings yet
Wireless World 1983 03
126 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
B58 - Handling Missing Values, Feature - Selection
No ratings yet
B58 - Handling Missing Values, Feature - Selection
4 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Healthcare-Project-Simplilearn - Week1
No ratings yet
Healthcare-Project-Simplilearn - Week1
6 pages
BE Module 5
No ratings yet
BE Module 5
16 pages
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
No ratings yet
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
20 pages
Diabetes Prediction Model Guide
No ratings yet
Diabetes Prediction Model Guide
20 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
No ratings yet
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
10 pages
Project
No ratings yet
Project
8 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Nov 2024 - UPSC Infocademy
No ratings yet
Nov 2024 - UPSC Infocademy
77 pages
Fswesg
No ratings yet
Fswesg
45 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
Heart - Cleveland - Ipynb - Colab
No ratings yet
Heart - Cleveland - Ipynb - Colab
5 pages
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
5 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Diabetes Prediction with SVM & RF
No ratings yet
Diabetes Prediction with SVM & RF
8 pages
Heart Health Data Analysis
No ratings yet
Heart Health Data Analysis
1 page
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
BDA Project Codes
No ratings yet
BDA Project Codes
20 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Hydronic Heaters Selection Spreadsheet
No ratings yet
Hydronic Heaters Selection Spreadsheet
19 pages
Regulatory Environment For Food and Beverage in Brazil
No ratings yet
Regulatory Environment For Food and Beverage in Brazil
12 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
12 pages
Client Communication Tracker
No ratings yet
Client Communication Tracker
2 pages
Rotary Valve Fast Cycle Pressure Swing Adsorption Paper
No ratings yet
Rotary Valve Fast Cycle Pressure Swing Adsorption Paper
14 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Sage X3 Server Sizing Guide
No ratings yet
Sage X3 Server Sizing Guide
6 pages
Plant Maintenance
No ratings yet
Plant Maintenance
14 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
Ambassador SWOT Examples
No ratings yet
Ambassador SWOT Examples
18 pages
Sajan Reliance MF
No ratings yet
Sajan Reliance MF
2 pages
2023 TPM Award Winners Announced
No ratings yet
2023 TPM Award Winners Announced
27 pages
I-Sem-Marketing Management
No ratings yet
I-Sem-Marketing Management
2 pages
APPROVED Vendor Pending List
No ratings yet
APPROVED Vendor Pending List
177 pages
1 s2.0 S1877705812011332 Main
No ratings yet
1 s2.0 S1877705812011332 Main
10 pages
Ifrs 8 Aggregation of Operating Segments
No ratings yet
Ifrs 8 Aggregation of Operating Segments
8 pages
Wizz Account Terms and Conditions
No ratings yet
Wizz Account Terms and Conditions
7 pages
Am 1370260123
No ratings yet
Am 1370260123
1 page
Applied Electronics Paper - IV: B.E. Sixth Semester (Aeronautical Engineering) (C.B.S.)
No ratings yet
Applied Electronics Paper - IV: B.E. Sixth Semester (Aeronautical Engineering) (C.B.S.)
2 pages
Mahendra Engineering College
No ratings yet
Mahendra Engineering College
2 pages
3M - Zinc Spray 16-501 - Data Sheet - 78-8125-9796-7-B
No ratings yet
3M - Zinc Spray 16-501 - Data Sheet - 78-8125-9796-7-B
2 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

In

[1]: # importing Libraries

In [2]: # import dataset

0 1 39 4.0 0 0.0 0.0 0 0 0 195.0

1 0 46 2.0 0 0.0 0.0 0 0 0 250.0

2 1 48 1.0 1 20.0 0.0 0 0 0 245.0

3 0 61 3.0 1 30.0 0.0 0 1 0 225.0

4 0 46 3.0 1 23.0 0.0 0 0 0 285.0

count 4238.000000 4238.000000 4133.000000 4238.000000 4209.000000 4185.000000 4238.000000 4238

mean 0.429212 49.584946 1.978950 0.494101 9.003089 0.029630 0.005899 0

std 0.495022 8.572160 1.019791 0.500024 11.920094 0.169584 0.076587 0

min 0.000000 32.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0

25% 0.000000 42.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0

50% 0.000000 49.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0

75% 1.000000 56.000000 3.000000 1.000000 20.000000 0.000000 0.000000 1

max 1.000000 70.000000 4.000000 1.000000 70.000000 1.000000 1.000000 1

Exploratory Data Analysis (EDA)

In [14]: sns.countplot(x= 'education', data = df)

In [15]: sns.countplot(x= 'prevalentStroke', data = df)

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Filling Null Values

In [29]: def fill_cigsPerDay(cols):

In [31]: def fill_education(cols):

In [33]: plt.figure(figsize =(15,10))

Implementing Logistic Regression

In [42]: y_pred = model.predict(X_test)

In [44]: accuracy = confusion_matrix(y_test,y_pred)

In [46]: accuracy = accuracy_score(y_test,y_pred)

You might also like