0% found this document useful (0 votes)

51 views10 pages

Stroke Prediction

Stock prediction pdf

Uploaded by

ganpatkumar9829

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views10 pages

Stroke Prediction

Stock prediction pdf

Uploaded by

ganpatkumar9829

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

data = pd.read_csv('healthcare-dataset-stroke-data.csv')
data

id gender age hypertension heart_disease ever_married work_type Residence_type avg_glucose_level bmi smoking_status stroke

0 9046 Male 67.0 0 1 Yes Private Urban 228.69 36.6 formerly smoked

Self-
1 51676 Female 61.0 0 0 Yes Rural 202.21 NaN never smoked
employed

2 31112 Male 80.0 0 1 Yes Private Rural 105.92 32.5 never smoked

3 60182 Female 49.0 0 0 Yes Private Urban 171.23 34.4 smokes

Self-
4 1665 Female 79.0 1 0 Yes Rural 174.12 24.0 never smoked
employed

... ... ... ... ... ... ... ... ... ... ... ...

5105 18234 Female 80.0 1 0 Yes Private Urban 83.75 NaN never smoked

Self-
5106 44873 Female 81.0 0 0 Yes Urban 125.20 40.0 never smoked
employed

Self-
5107 19723 Female 35.0 0 0 Yes Rural 82.99 30.6 never smoked
employed

5108 37544 Male 51.0 0 0 Yes Private Rural 166.29 25.6 formerly smoked

5109 44679 Female 44.0 0 0 Yes Govt_job Urban 85.28 26.2 Unknown

5110 rows × 12 columns

Data Preprocessing
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5110 entries, 0 to 5109
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 5110 non-null int64
1 gender 5110 non-null object
2 age 5110 non-null float64
3 hypertension 5110 non-null int64
4 heart_disease 5110 non-null int64
5 ever_married 5110 non-null object
6 work_type 5110 non-null object
7 Residence_type 5110 non-null object
8 avg_glucose_level 5110 non-null float64
9 bmi 4909 non-null float64
10 smoking_status 5110 non-null object
11 stroke 5110 non-null int64
dtypes: float64(3), int64(4), object(5)
memory usage: 479.2+ KB

data.describe()

id age hypertension heart_disease avg_glucose_level bmi stroke

count 5110.000000 5110.000000 5110.000000 5110.000000 5110.000000 4909.000000 5110.000000

mean 36517.829354 43.226614 0.097456 0.054012 106.147677 28.893237 0.048728

std 21161.721625 22.612647 0.296607 0.226063 45.283560 7.854067 0.215320

min 67.000000 0.080000 0.000000 0.000000 55.120000 10.300000 0.000000

25% 17741.250000 25.000000 0.000000 0.000000 77.245000 23.500000 0.000000

50% 36932.000000 45.000000 0.000000 0.000000 91.885000 28.100000 0.000000

75% 54682.000000 61.000000 0.000000 0.000000 114.090000 33.100000 0.000000

max 72940.000000 82.000000 1.000000 1.000000 271.740000 97.600000 1.000000

data.isnull().sum()
id 0
gender 0
age 0
hypertension 0
heart_disease 0
ever_married 0
work_type 0
Residence_type 0
avg_glucose_level 0
bmi 201
smoking_status 0
stroke 0
dtype: int64

# Checking the distribution of the missing data column.

plt.figure(figsize=(8,5))
data['bmi'].plot(kind='kde')
plt.show()

Checking the distribution of the missing data column i.e bmi.

Missing value Treatment

data['bmi'].fillna(data['bmi'].mean(), inplace=True)

# re-checking missing value

data.isnull().sum()

id 0
gender 0
age 0
hypertension 0
heart_disease 0
ever_married 0
work_type 0
Residence_type 0
avg_glucose_level 0
bmi 0
smoking_status 0
stroke 0
dtype: int64

Droping unnecessary columns

data.drop(['id'], axis = 1, inplace=True)

data.head()

gender age hypertension heart_disease ever_married work_type Residence_type avg_glucose_level bmi smoking_status stroke

0 Male 67.0 0 1 Yes Private Urban 228.69 36.600000 formerly smoked 1

Self-
1 Female 61.0 0 0 Yes Rural 202.21 28.893237 never smoked 1
employed

2 Male 80.0 0 1 Yes Private Rural 105.92 32.500000 never smoked 1

3 Female 49.0 0 0 Yes Private Urban 171.23 34.400000 smokes 1

Self-
4 Female 79.0 1 0 Yes Rural 174.12 24.000000 never smoked 1
employed
EDA

Target variable (Stroke)

data['stroke'].value_counts().plot(kind='bar')
plt.show()

Checking outliers in our dataset (Categorical columns)

num=data.select_dtypes(exclude='object')

for i in num.columns:
sns.boxplot(data=num,x=i)
plt.show()
Gender
data['gender'].value_counts()

Female 2994
Male 2115
Other 1
Name: gender, dtype: int64

sns.countplot(data=data,x='gender')
plt.show()

sns.countplot(data=data,x='gender',hue='stroke')
plt.show()
data['stroke'].value_counts().plot(kind='pie',autopct='%0.2f%%')
plt.show()

Age
# More men than women had strokes
data.groupby('gender').mean()[['age', 'stroke']]

age stroke

gender

Female 43.757395 0.047094

Male 42.483385 0.051064

Other 26.000000 0.000000

More men than women had stroke attack.

Ever married
data['ever_married'].value_counts()

Yes 3353
No 1757
Name: ever_married, dtype: int64

sns.countplot(data=data,x='ever_married',hue='stroke')
plt.show()

Work Type
data['work_type'].unique()
data['work_type'].unique()

array(['Private', 'Self-employed', 'Govt_job', 'children', 'Never_worked'],

dtype=object)

data['work_type'].value_counts()

Private 2925
Self-employed 819
children 687
Govt_job 657
Never_worked 22
Name: work_type, dtype: int64

sns.countplot(data=data,x='work_type',hue='stroke')
plt.show()

Residence Type
data['Residence_type'].unique()

array(['Urban', 'Rural'], dtype=object)

data['Residence_type'].value_counts()

Urban 2596
Rural 2514
Name: Residence_type, dtype: int64

sns.countplot(data=data,x='Residence_type',hue='stroke')
plt.show()

Smoking Features
data['smoking_status'].value_counts()

never smoked 1892

Unknown 1544
formerly smoked 885
smokes 789
Name: smoking_status, dtype: int64

sns.countplot(data=data,x='smoking_status',hue='stroke')
plt.show()
Heatmap
sns.heatmap(data.corr(),annot=True,fmt='.2f')
plt.show()

Encoding the categorical variables

data.dtypes

gender object
age float64
hypertension int64
heart_disease int64
ever_married object
work_type object
Residence_type object
avg_glucose_level float64
bmi float64
smoking_status object
stroke int64
dtype: object

from sklearn.preprocessing import LabelEncoder

lr = LabelEncoder()

data['gender'] = lr.fit_transform(data['gender'])
data['ever_married'] = lr.fit_transform(data['ever_married'])
data['work_type'] = lr.fit_transform(data['work_type'])
data['Residence_type'] = lr.fit_transform(data['Residence_type'])
data['smoking_status'] = lr.fit_transform(data['smoking_status'])

Splitting data into independent and dependent variables

X=data.drop('stroke',axis=1).values
X
array([[ 1. , 67. , 0. , ..., 228.69 ,
36.6 , 1. ],
[ 0. , 61. , 0. , ..., 202.21 ,
28.89323691, 2. ],
[ 1. , 80. , 0. , ..., 105.92 ,
32.5 , 2. ],
...,
[ 0. , 35. , 0. , ..., 82.99 ,
30.6 , 2. ],
[ 1. , 51. , 0. , ..., 166.29 ,
25.6 , 1. ],
[ 0. , 44. , 0. , ..., 85.28 ,
26.2 , 0. ]])

Y=data['stroke'].values
Y

array([1, 1, 1, ..., 0, 0, 0], dtype=int64)

# splitting

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20, random_state=0)

Logistic Regression
from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()

classifier.fit(X_train, Y_train)

LogisticRegression()

predict = classifier.predict(X_test)
predict

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

Y_test

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for Logistic Regression

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(Y_test, predict))

[[968 0]
[ 54 0]]

print(classification_report(Y_test, predict))

precision recall f1-score support

0 0.95 1.00 0.97 968

1 0.00 0.00 0.00 54

accuracy 0.95 1022

macro avg 0.47 0.50 0.49 1022
weighted avg 0.90 0.95 0.92 1022

print('Accuracy score :',accuracy_score(Y_test, predict))

Accuracy score : 0.9471624266144814

KNN Classifier
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()

knn.fit(X_train, Y_train)

KNeighborsClassifier()
pred = knn.predict(X_test)
pred

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

Y_test

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for KNN Classifier

print('Accuracy:',accuracy_score(Y_test, pred))

Accuracy: 0.9422700587084148

Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(max_depth=3)

classifier.fit(X_train, Y_train)

DecisionTreeClassifier(max_depth=3)

Y_pred = classifier.predict(X_test)
Y_pred

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

Y_test

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for Decision Tree Classifier

print('Accuracy:',accuracy_score(Y_test, Y_pred))

Accuracy: 0.9461839530332681

Ploting Tree with plot_tree

from sklearn import tree

fig = plt.figure(figsize=(15,10))
tree.plot_tree(classifier,filled=True,class_names=True,node_ids=True)
plt.show()
Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier()

classifier.fit(X_train,Y_train)

RandomForestClassifier()

Y_pred1 = classifier.predict(X_test)
Y_pred1

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

Y_test

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for Random Forest Classifier

print('Accuracy:', accuracy_score(Y_pred1, Y_test))

Accuracy: 0.9461839530332681

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

Ian Goodfellow, Yoshua Bengio, Aaron Courville-Deep Learning (Pre-Pub Version) - MIT Press (2016) PDF
91% (11)
Ian Goodfellow, Yoshua Bengio, Aaron Courville-Deep Learning (Pre-Pub Version) - MIT Press (2016) PDF
802 pages
Show Me The Numbers by Stephen Few (2012) PDF
80% (10)
Show Me The Numbers by Stephen Few (2012) PDF
373 pages
Data Analytics Using Python
100% (4)
Data Analytics Using Python
982 pages
Weekly Quiz 1 Machine Learning Great Learning PDF
100% (2)
Weekly Quiz 1 Machine Learning Great Learning PDF
7 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Data Science MCQ Questions and Answer PDF
70% (10)
Data Science MCQ Questions and Answer PDF
6 pages
Business Analytics Data Science For Business Problems (Walter R. Paczkowski)
100% (5)
Business Analytics Data Science For Business Problems (Walter R. Paczkowski)
416 pages
An Introduction To Data Analysis in R - 9783030489977 PDF
100% (4)
An Introduction To Data Analysis in R - 9783030489977 PDF
289 pages
Data Visualization in Python Preview PDF
100% (9)
Data Visualization in Python Preview PDF
58 pages
Early Buddhist Language Analysis
No ratings yet
Early Buddhist Language Analysis
41 pages
Python Data Science
92% (12)
Python Data Science
65 pages
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
89% (18)
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
102 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (15)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
R Book PDF
100% (5)
R Book PDF
291 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Data Science With Python
100% (4)
Data Science With Python
725 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
Machine Learning Projects Python
94% (18)
Machine Learning Projects Python
134 pages
Data Analytics Concepts Techniques and A PDF
100% (14)
Data Analytics Concepts Techniques and A PDF
451 pages
Phonetics Booklet - Key
No ratings yet
Phonetics Booklet - Key
11 pages
Data Science Theory, Analysis and Applications - Memon - Ahmed
100% (14)
Data Science Theory, Analysis and Applications - Memon - Ahmed
345 pages
NVR User's Installation and Operation Manual
No ratings yet
NVR User's Installation and Operation Manual
97 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
100% (12)
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
142 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
12 pages
810 in Voice 5010
100% (1)
810 in Voice 5010
54 pages
Logistic Regression for Heart Disease
No ratings yet
Logistic Regression for Heart Disease
8 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
The Victorian Poetry
No ratings yet
The Victorian Poetry
7 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Heart Disease Prediction Using Decision Tree Analysis
No ratings yet
Heart Disease Prediction Using Decision Tree Analysis
10 pages
Advance Statistical Methods in Data Science Chen
100% (6)
Advance Statistical Methods in Data Science Chen
229 pages
Design of OFDM Transmitter and Receiver For Error Free Communication
No ratings yet
Design of OFDM Transmitter and Receiver For Error Free Communication
61 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
LRP English New
No ratings yet
LRP English New
60 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Exp 5
No ratings yet
Exp 5
7 pages
ML 7
No ratings yet
ML 7
6 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
How Evolution of Database Led To Data Mining
No ratings yet
How Evolution of Database Led To Data Mining
10 pages
Diabetes Prediction Analysis
No ratings yet
Diabetes Prediction Analysis
5 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
Heart Failure Prediction EDA & Modeling
No ratings yet
Heart Failure Prediction EDA & Modeling
38 pages
Brain Stroke Prediction Using ML - Jupyter Notebook
No ratings yet
Brain Stroke Prediction Using ML - Jupyter Notebook
17 pages
Diabetes Prediction Model Guide
No ratings yet
Diabetes Prediction Model Guide
20 pages
(@bohring - Bot) Pks Electrostats
No ratings yet
(@bohring - Bot) Pks Electrostats
10 pages
Practical 4
No ratings yet
Practical 4
2 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Space and Geometry in The B Deduction
No ratings yet
Space and Geometry in The B Deduction
31 pages
Assignment 3
No ratings yet
Assignment 3
7 pages
Outstanding Short Stories-TR
No ratings yet
Outstanding Short Stories-TR
5 pages
ML Proj Diabetes
No ratings yet
ML Proj Diabetes
51 pages
MPMC Syllabus
No ratings yet
MPMC Syllabus
217 pages
A Schedule Is Said To Be Conflict-Serializable When The Schedule Is Conflict-Equivalent To One or More Serial Schedules
No ratings yet
A Schedule Is Said To Be Conflict-Serializable When The Schedule Is Conflict-Equivalent To One or More Serial Schedules
9 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
Akshay B. Machine Learning. A Comprehensive Beginner's Guide 2025
100% (2)
Akshay B. Machine Learning. A Comprehensive Beginner's Guide 2025
259 pages
AMS 507 Introduction To Probability: 1.2. The Basic Principle of Counting
No ratings yet
AMS 507 Introduction To Probability: 1.2. The Basic Principle of Counting
3 pages
Predictive Modeling for Analysts
100% (1)
Predictive Modeling for Analysts
28 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Spreadsheet Evolution for Professionals
100% (6)
Spreadsheet Evolution for Professionals
18 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Hamiltonian Graph
No ratings yet
Hamiltonian Graph
6 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
Turing Data Analysis
No ratings yet
Turing Data Analysis
30 pages
A.I Lab Report
No ratings yet
A.I Lab Report
24 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
English Levels Explained for Beginners
No ratings yet
English Levels Explained for Beginners
7 pages
Prediction of Risk in Cardiovascular Disease Using Machine Learning Algorithms
No ratings yet
Prediction of Risk in Cardiovascular Disease Using Machine Learning Algorithms
6 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
My Tribute To Sara Roxana Diaz Aulestia
No ratings yet
My Tribute To Sara Roxana Diaz Aulestia
3 pages
Luxury Living at Sainamaha Panvel
No ratings yet
Luxury Living at Sainamaha Panvel
9 pages
m3125 Practical 3
No ratings yet
m3125 Practical 3
13 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Dsbda 5
No ratings yet
Dsbda 5
12 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Gender Autonomy As A Feminist Premise of Identity and Its Impact Upon Female Protagonists in Fictional Narratives
No ratings yet
Gender Autonomy As A Feminist Premise of Identity and Its Impact Upon Female Protagonists in Fictional Narratives
7 pages
English FAL P2 May-June 2023-1
No ratings yet
English FAL P2 May-June 2023-1
28 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
B58 - Handling Missing Values, Feature - Selection
No ratings yet
B58 - Handling Missing Values, Feature - Selection
4 pages
Infinitives - Rule - and - Check - Answer Key
No ratings yet
Infinitives - Rule - and - Check - Answer Key
4 pages
Racism Essay Thesis Statement
100% (3)
Racism Essay Thesis Statement
5 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Scattering Theory
No ratings yet
Scattering Theory
1 page
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
CHAITYAVANDAN
No ratings yet
CHAITYAVANDAN
4 pages
MS Excel Full Notes PDF Free Download - Google Search
No ratings yet
MS Excel Full Notes PDF Free Download - Google Search
3 pages
Project 16 Calories Burnt Prediction
No ratings yet
Project 16 Calories Burnt Prediction
10 pages
Gold C1 Advanced NE DF UT02
No ratings yet
Gold C1 Advanced NE DF UT02
2 pages
Stroke Prediction
No ratings yet
Stroke Prediction
14 pages
Health Risk Prediction
No ratings yet
Health Risk Prediction
80 pages
GIT Full Content
No ratings yet
GIT Full Content
5 pages
Heart Disease Report With Comments and Code
No ratings yet
Heart Disease Report With Comments and Code
9 pages
Prediction - Ipynb - Colab
No ratings yet
Prediction - Ipynb - Colab
7 pages
Lab2 Day8 23BCSA84 AssignmentSolution
No ratings yet
Lab2 Day8 23BCSA84 AssignmentSolution
7 pages
Mock Part1.ipynb - Colab
No ratings yet
Mock Part1.ipynb - Colab
10 pages
39 Books of The Old Testement Names and Meaning Assignment
No ratings yet
39 Books of The Old Testement Names and Meaning Assignment
4 pages
Week-01 B
No ratings yet
Week-01 B
4 pages
ST Francis Xavier
No ratings yet
ST Francis Xavier
45 pages
6034 Logistic Regression
No ratings yet
6034 Logistic Regression
6 pages
Data Perparation Penting
No ratings yet
Data Perparation Penting
12 pages

Stroke Prediction

Uploaded by

Stroke Prediction

Uploaded by

import numpy as np

3 60182 Female 49.0 0 0 Yes Private Urban 171.23 34.4 smokes

5110 rows × 12 columns

id age hypertension heart_disease avg_glucose_level bmi stroke

count 5110.000000 5110.000000 5110.000000 5110.000000 5110.000000 4909.000000 5110.000000

mean 36517.829354 43.226614 0.097456 0.054012 106.147677 28.893237 0.048728

std 21161.721625 22.612647 0.296607 0.226063 45.283560 7.854067 0.215320

min 67.000000 0.080000 0.000000 0.000000 55.120000 10.300000 0.000000

25% 17741.250000 25.000000 0.000000 0.000000 77.245000 23.500000 0.000000

50% 36932.000000 45.000000 0.000000 0.000000 91.885000 28.100000 0.000000

75% 54682.000000 61.000000 0.000000 0.000000 114.090000 33.100000 0.000000

max 72940.000000 82.000000 1.000000 1.000000 271.740000 97.600000 1.000000

# Checking the distribution of the missing data column.

Checking the distribution of the missing data column i.e bmi.

Missing value Treatment

# re-checking missing value

Droping unnecessary columns

0 Male 67.0 0 1 Yes Private Urban 228.69 36.600000 formerly smoked 1

2 Male 80.0 0 1 Yes Private Rural 105.92 32.500000 never smoked 1

3 Female 49.0 0 0 Yes Private Urban 171.23 34.400000 smokes 1

Target variable (Stroke)

Checking outliers in our dataset (Categorical columns)

Female 43.757395 0.047094

Male 42.483385 0.051064

Other 26.000000 0.000000

More men than women had stroke attack.

array(['Private', 'Self-employed', 'Govt_job', 'children', 'Never_worked'],

array(['Urban', 'Rural'], dtype=object)

never smoked 1892

Encoding the categorical variables

from sklearn.preprocessing import LabelEncoder

Splitting data into independent and dependent variables

array([1, 1, 1, ..., 0, 0, 0], dtype=int64)

from sklearn.model_selection import train_test_split

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for Logistic Regression

precision recall f1-score support

0 0.95 1.00 0.97 968

accuracy 0.95 1022

print('Accuracy score :',accuracy_score(Y_test, predict))

Accuracy score : 0.9471624266144814

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for KNN Classifier

Decision Tree Classifier

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for Decision Tree Classifier

Ploting Tree with plot_tree

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

Evaluation for Random Forest Classifier

You might also like