0% found this document useful (0 votes)

4 views10 pages

Mock Part1.ipynb - Colab

The document is a Jupyter notebook that processes a CSV file containing health-related data, including attributes like gender, age, glucose level, and BMI. It performs various data analysis tasks such as counting unique values, identifying missing or erroneous entries, and summarizing demographic information. The dataset includes 3999 entries with a focus on health indicators and lifestyle factors.

Uploaded by

ramras0509

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views10 pages

Mock Part1.ipynb - Colab

Uploaded by

ramras0509

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

8/16/25, 10:40 AM Mock Part1.

ipynb - Colab

from google.colab import files

uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to
enable.
Saving T124OPPE2 Preprocessing V1.csv to T124OPPE2 Preprocessing V1 (2).csv

import numpy as np
import pandas as pd

df = pd.read_csv('T124OPPE2_Preprocessing_V1.csv')

Gender Age HasTension AnyHeartDisease NeverMarried Occupation LivesIn GlucoseLevel BMI SmokingStatus HeartAttack

Self-
0 Female 75.0 Yes No Yes City 54.6 35.1 never smoked No
employed

1 Female 49.0 No No Yes Private Village 108.8 26.7 smokes No

2 Male 32.0 No No Yes Private City 64.1 23.4 smokes No

Self-
3 Male 78.0 No No Yes City 219.2 27.4 Unknown Yes
employed

formerly
4 Male 39.0 No No Yes Private City 55.4 41.6 No
smoked

... ... ... ... ... ... ... ... ... ... ... ...

3995 Female 40.0 No No Yes Private City 88.4 36.5 smokes No

3996 Female 18.0 No No No Private Village 168.5 48.2 never smoked No

3997 Male 27.0 No No Yes Private City 76.5 21.0 never smoked No

3998 Female 28.0 No No No Private City 80.0 27.1 smokes No

formerly

df.Gender.value_counts()

count

Gender

Female 2366

Male 1627

Unknown 7

dtype: int64

print(df['Gender'].unique())

['Female' 'Male' 'Unknown']

df.Age.value_counts()

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 1/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab

count

Age

78.00 79

45.00 71

57.00 71

53.00 69

54.00 69

... ...

1.72 3

1.16 2

0.40 2

1.40 1

0.08 1

105 rows × 1 columns

dtype: int64

(df.Age<1).sum()

np.int64(40)

print(df['Age'].unique())

[ 7.50e+01 4.90e+01 3.20e+01 7.80e+01 3.90e+01 -3.00e+00 6.30e+01

4.00e+00 4.50e+01 5.20e+01 3.10e+01 5.70e+01 5.60e+01 2.00e+01
2.40e-01 3.80e+01 8.20e+01 3.40e+01 2.90e+01 1.60e+01 7.00e+00
3.70e+01 5.10e+01 2.60e+01 5.30e+01 5.00e+01 2.00e+00 5.40e+01
7.90e+01 6.00e+00 5.80e+01 6.50e+01 1.30e+01 4.70e+01 1.90e+01
7.40e+01 7.30e+01 1.20e+01 9.00e+00 2.70e+01 3.30e+01 8.10e+01
3.60e+01 7.60e+01 7.10e+01 4.60e+01 2.50e+01 1.70e+01 2.20e+01
1.10e+01 5.50e+01 6.10e+01 6.90e+01 7.70e+01 4.20e+01 2.40e+01
7.20e+01 3.50e+01 8.00e+01 1.64e+00 5.90e+01 6.00e+01 4.10e+01
4.30e+01 6.20e+01 6.40e+01 5.00e+00 7.00e+01 2.10e+01 1.00e+01
1.80e+01 1.40e+01 4.00e+01 1.50e+01 3.00e+00 4.80e+01 8.00e+00
1.32e+00 6.70e+01 4.40e+01 6.80e+01 1.56e+00 2.30e+01 6.60e+01
8.00e-01 3.00e+01 1.88e+00 2.80e+01 5.60e-01 6.40e-01 3.20e-01
1.80e+00 1.60e-01 7.20e-01 1.40e+00 8.80e-01 1.08e+00 1.72e+00
1.24e+00 4.00e-01 1.00e+00 1.48e+00 4.80e-01 8.00e-02 1.16e+00]

(df.Age==-3.00e+00).sum()

np.int64(8)

df[df.GlucoseLevel < 0]['GlucoseLevel']

GlucoseLevel

98 -2.0

697 -2.0

829 -2.0

1460 -2.0

2428 -2.0

2543 -2.0

3095 -2.0

3370 -2.0

3986 -2.0

dtype: float64

(df.GlucoseLevel<0).sum()

np.int64(9)

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 2/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab
df.LivesIn.value_counts()

count

LivesIn

City 2030

Village 1965

Unknown 5

dtype: int64

df.BMI.unique()

array([35.1, 26.7, 23.4, 27.4, 41.6, 29.3, 37.1, 16.1, 40.5, 15.8, 29.9,
29.5, 40.7, 36.6, 31.5, nan, 12.1, 25.5, 22.3, 27.1, 44.7, 26.1,
18.8, 18.7, 24.8, 17. , 37.6, 20.6, 29. , 56.2, 30.7, 25.3, 23. ,
27.2, 19.2, 31.6, 24.6, 27. , 24.5, 18.2, 52. , 32.3, 42.7, 30. ,
24.3, 24.2, 25.7, 36.7, 46.4, 48.3, 20.9, 24.7, 23.6, 26.5, 39.4,
18.4, 25.6, 25.9, 54.6, 31.9, 14.6, 38.7, 23.7, 27.3, 29.2, 39.7,
30.1, 28.1, 35.7, 14.3, 30.4, 22.2, 35. , 44.5, 36.3, 25.2, 26.6,
31.4, 36.8, 25.8, 38.4, 43.2, 20.4, 30.6, 33.8, 34. , 26.2, 29.6,
30.2, 22.9, 38.9, 16.3, 23.3, 25.1, 34.1, 45.7, 37.3, 26.4, 40.9,
31.1, 17.7, 27.5, 19.9, 32. , 35.9, 32.1, 24.9, 23.8, 18. , 20.7,
27.7, 22.6, 13.1, 19.4, 28.5, 28.8, 21.7, 19.6, 27.8, 41. , 41.8,
35.2, 44.4, 42.6, 15.7, 52.8, 23.1, 38.5, 22.7, 18.3, 42.3, 43.4,
51.5, 24. , 28.7, 23.9, 37.9, 32.6, 35.6, 34.7, 28.3, 33.2, 32.5,
44.1, 34.2, 22. , 33.3, 16.8, 35.4, 20.1, 26.3, 37.5, 33.1, 21.2,
33. , 33.6, 30.3, 26. , 34.8, 31.8, 42.4, 25.4, 23.2, 19.3, 27.9,
36.1, 43. , 16.9, 20.5, 33.9, 28.2, 24.1, 41.1, 32.2, 26.8, 30.8,
22.5, 29.7, 40. , 34.5, 28. , 37.7, 19.8, 28.4, 34.3, 20.8, 16.2,
17.5, 36.9, 19.5, 31.7, 34.4, 29.1, 39.3, 35.5, 21. , 31. , 25. ,
20.3, 17.9, 36.5, 47.5, 19. , 23.5, 38.8, 39.5, 22.1, 30.5, 29.4,
32.4, 16.7, 22.8, 16.4, 24.4, 54.8, 19.1, 39.2, 18.5, 28.6, 48.2,
41.2, 20. , 34.6, 36.4, 29.8, 59.7, 14.4, 28.9, 27.6, 19.7, 32.8,
44.8, 21.1, 16.6, 14.9, 18.9, 17.1, 20.2, 33.7, 38.2, 55.6, 21.5,
32.9, 40.3, 18.1, 38.6, 15.1, 41.5, 21.9, 39.6, 42. , 13.7, 38. ,
41.3, 35.3, 48.6, 41.7, 21.3, 47.6, 30.9, 61.1, 31.3, 38.3, 37.2,
35.8, 49.1, 33.5, 18.6, 44.3, 26.9, 17.2, 31.2, 44. , 49.8, 39.1,
39.8, 39.9, 17.3, 22.4, 21.4, 40.2, 33.4, 56.5, 37.4, 17.4, 16.5,
21.6, 41.4, 37. , 36. , 43.7, 13.3, 17.8, 14.8, 39. , 40.8, 48.4,
43.8, 63.6, 36.2, 42.5, 40.1, 43.9, 15.4, 13.2, 43.5, 58.7, 14. ,
46.8, 43.6, 37.8, 46.3, 45.4, 15.2, 17.6, 32.7, 46.1, 42.1, 58.5,
45.9, 41.9, 50.7, 54. , 21.8, 47. , 66.1, 45.3, 34.9, 42.2, 55.7,
55.1, 45.1, 52.7, 16. , 54.2, 40.6, 49.9, 13.8, 53.4, 46.9, 55.8,
45.6, 43.3, 15.3, 77.9, 47.3, 38.1, 57.9, 53.6, 9.6, 15.5, 49.2,
45. , 49.5, 12.8, 62.6, 43.1, 12.2, 56.8, 45.2, 51.9, 52.1, 62.5,
47.8, 51.3, 45.5, 10.9, 47.4, 49.7, 49.4, 48. , 50.8, 52.4, 45.8,
97. , 13.6, 48.9, 57.3, 14.5, 40.4, 47.9, 50.5, 12.6, 14.1, 57.4,
42.8, 49.3, 46.2, 48.7, 58.4, 53.3, 55. , 46.5, 53.1, 51.8, 50.3,
53.9, 58.1, 13. , 52.2, 15.9, 13.9, 51. , 51.4, 57.6, 46.7, 53. ,
15. , 14.7])

df.BMI.isnull().sum()

np.int64(149)

df.SmokingStatus.value_counts()

count

SmokingStatus

never smoked 1502

Unknown 1204

formerly smoked 697

smokes 597

dtype: int64

df.BMI.mean()

np.float64(28.857958971695663)

df[(df.LivesIn=='City')&(df.SmokingStatus.isin(['formerly smoked','smokes'])) &(df.HeartAttack=='Yes')].shape[0]

df.head(5)

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 3/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab

Gender Age HasTension AnyHeartDisease NeverMarried Occupation LivesIn GlucoseLevel BMI SmokingStatus HeartAttack

0 Female 75.0 Yes No Yes Self-employed City 54.6 35.1 never smoked No

1 Female 49.0 No No Yes Private Village 108.8 26.7 smokes No

2 Male 32.0 No No Yes Private City 64.1 23.4 smokes No

3 Male 78.0 No No Yes Self-employed City 219.2 27.4 Unknown Yes

4 Male 39.0 No No Yes Private City 55.4 41.6 formerly smoked No

df.NeverMarried.value_counts()

count

NeverMarried

Yes 2626

No 1374

dtype: int64

#Which of the following categories have highest frequency? Ignore rows with missing values.

#female patients without tension, without any heart disease and never married

#female patients without tension, without any heart disease and either currently married or married before

#male patients without tension, without any heart disease and never married

#male patients with tension, with a heart disease and never married

#There is a tie between 2 or more options.

df[(df.Gender=='Female')&(df.HasTension=='No')&(df.AnyHeartDisease=='No')&(df.NeverMarried=='Yes')].shape[0]

1335

df[(df.Gender=='Female')&(df.HasTension=='No')&(df.AnyHeartDisease=='No')&(df.NeverMarried=='No')].shape[0]

754

df[(df.Gender=='Male')&(df.HasTension=='No')&(df.AnyHeartDisease=='No')&(df.NeverMarried=='Yes')].shape[0]

795

df[(df.Gender=='Male')&(df.HasTension=='Yes')&(df.AnyHeartDisease=='Yes')&(df.NeverMarried=='Yes')].shape[0]

df.select_dtypes(include=['object'])

Gender HasTension AnyHeartDisease NeverMarried Occupation LivesIn SmokingStatus HeartAttack

0 Female Yes No Yes Self-employed City never smoked No

1 Female No No Yes Private Village smokes No

2 Male No No Yes Private City smokes No

3 Male No No Yes Self-employed City Unknown Yes

4 Male No No Yes Private City formerly smoked No

... ... ... ... ... ... ... ... ...

3995 Female No No Yes Private City smokes No

3996 Female No No No Private Village never smoked No

3997 Male No No Yes Private City never smoked No

3998 Female No No No Private City smokes No

3999 Female No No Yes Private Village formerly smoked Yes

4000 rows × 8 columns

df.HeartAttack.value_counts()

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 4/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab

count

HeartAttack

No 3806

Yes 194

dtype: int64

from sklearn.model_selection import train_test_split

df.HeartAttack=df.HeartAttack.map({'Yes':1,'No':0})

df.HeartAttack

HeartAttack

0 0

1 0

2 0

3 1

4 0

... ...

3995 0

3996 0

3997 0

3998 0

3999 1

4000 rows × 1 columns

dtype: int64

X= df.drop(columns='HeartAttack')
y = df.HeartAttack

HeartAttack

0 0

1 0

2 0

3 1

4 0

... ...

3995 0

3996 0

3997 0

3998 0

3999 1

4000 rows × 1 columns

dtype: int64

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0,stratify=y)

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 5/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab
from sklearn.preprocessing import OneHotEncoder,OrdinalEncoder,StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline

gender_pipe = Pipeline([('imputer',SimpleImputer(strategy='most_frequent',missing_values='Unknown')),('ordinal',OrdinalEncoder())])

age_pipe = Pipeline([('imputer',SimpleImputer(strategy='mean',missing_values=-3)),('scaler',StandardScaler())])

tension_pipe = Pipeline([('ordinal',OrdinalEncoder())])

any_pipe = Pipeline([('ordinal',OrdinalEncoder())])
never_pipe = Pipeline([('ordinal',OrdinalEncoder())])

occ_pipe = Pipeline([('onehot',OneHotEncoder(sparse_output=False))])
livesin_pipe = Pipeline([('imputer',SimpleImputer(strategy='most_frequent',missing_values='Unknown')),('ordinal',OrdinalEncoder())])

from sklearn.preprocessing import MinMaxScaler

glucose_pipe = Pipeline([('imputer',SimpleImputer(strategy='mean',missing_values=-2)),('minmax',MinMaxScaler())])

bmi_pipe = Pipeline([('imputer',SimpleImputer(strategy='mean',missing_values=np.nan)),('scaler',StandardScaler())])
status_pipe = Pipeline([('imputer',SimpleImputer(strategy='most_frequent',missing_values='Unknown')),('onehot',OneHotEncoder(sparse_outp

df.columns

Index(['Gender', 'Age', 'HasTension', 'AnyHeartDisease', 'NeverMarried',

'Occupation', 'LivesIn', 'GlucoseLevel', 'BMI', 'SmokingStatus',
'HeartAttack'],
dtype='object')

pre = ColumnTransformer([('gender',gender_pipe,['Gender']),('age',age_pipe,['Age']),('tension',tension_pipe,['HasTension']),
('any',any_pipe,['AnyHeartDisease']),('never',never_pipe,['NeverMarried']),
('occ',occ_pipe,['Occupation']), ('lives',livesin_pipe,['LivesIn']),('glucose',glucose_pipe,['GlucoseLevel']),(
('status',status_pipe,['SmokingStatus'])],verbose_feature_names_out=False,remainder='drop').set_output(transfor

pre

▸ gender ▸ age ▸ tension ▸ any ▸ n

▸ SimpleImputer ? ▸ SimpleImputer ? ▸ OrdinalEncoder ? ▸ OrdinalEncoder ? ▸ Ordinal

▸ OrdinalEncoder ? ▸ StandardScaler ?

X_train.columns

Index(['Gender', 'Age', 'HasTension', 'AnyHeartDisease', 'NeverMarried',

'Occupation_Govt_job', 'Occupation_Never_worked', 'Occupation_Private',
'Occupation_Self-employed', 'Occupation_children', 'LivesIn',
'GlucoseLevel', 'BMI', 'SmokingStatus_formerly smoked',
'SmokingStatus_never smoked', 'SmokingStatus_smokes'],
dtype='object')

X_train= pre.fit_transform(X_train)
X_test = pre.transform(X_test)

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 6/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/pandas/core/indexes/base.py in get_loc(self, key)
3804 try:
-> 3805 return self._engine.get_loc(casted_key)
3806 except KeyError as err:

index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Occupation'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)

7 frames
KeyError: 'Occupation'

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)

/usr/local/lib/python3.11/dist-packages/sklearn/utils/_indexing.py in _get_column_indices(X, key)
370
371 except KeyError as e:
--> 372 raise ValueError("A given column is not a column of the dataframe") from e
373
374 return column_indices

ValueError: A given column is not a column of the dataframe

X_train.shape

(2800, 16)

X_train.columns

Index(['Gender', 'Age', 'HasTension', 'AnyHeartDisease', 'NeverMarried',

'Occupation', 'LivesIn', 'GlucoseLevel', 'BMI', 'SmokingStatus',
'HeartAttack'],
dtype='object')

X_test.mean().mean()

np.float64(0.24624313115075144)

from sklearn.feature_selection import RFE

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(random_state=1729)

rfe = RFE(estimator= model , n_features_to_select= X_train.shape[1]-1)

rfe.fit(X_train,y_train)

index = list(rfe.support_).index(False)
print(index)

from google.colab import files

uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to
enable.

df = pd.read_csv('T124OPPE2_ModelBuilding_V1.csv')

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 7/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab

Gender Age HasTension AnyHeartDisease NeverMarried Occupation_Govt_job Occupation_Never_worked Occupation_Private

0 0.0 0.433901 0.0 0.0 1.0 0.0 0.0 0.0

1 1.0 -1.840435 0.0 0.0 0.0 0.0 0.0 0.0

2 1.0 -1.160260 0.0 0.0 0.0 1.0 0.0 0.0

3 1.0 -0.806002 0.0 0.0 1.0 0.0 0.0 1.0

4 0.0 0.743876 0.0 0.0 1.0 0.0 0.0 1.0

... ... ... ... ... ... ... ... ...

3995 0.0 0.389618 0.0 0.0 1.0 0.0 0.0 1.0

3996 1.0 1.452392 1.0 0.0 1.0 0.0 0.0 0.0

3997 0.0 0.433901 1.0 0.0 1.0 0.0 0.0 1.0

3998 1.0 0.921005 0.0 1.0 1.0 0.0 0.0 1.0

3999 0.0 -1.558800 0.0 0.0 0.0 0.0 0.0 1.0

4000 rows × 17 columns

X = df.drop(columns='HeartAttack')
y = df.HeartAttack

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0,shuffle=False)

from sklearn.linear_model import Perceptron

from sklearn.metrics import precision_score

model = Perceptron(
random_state=1729,
eta0=1,
max_iter=1,
shuffle=False,
validation_fraction=0.1,
alpha=0
)

for i in range(5):
model.partial_fit(X_train,y_train,[0,1])
y_pred = model.predict(X_train)
print(precision_score(y_train,y_pred))
print(model.intercept_)

0.3333333333333333
[-4.]
0.13333333333333333
[-3.]
0.15384615384615385
[-3.]
0.0
[-4.]
0.6666666666666666
[-3.]
/usr/local/lib/python3.11/dist-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined an
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

from sklearn.linear_model import SGDClassifier

from sklearn.metrics import log_loss

clf = SGDClassifier(
loss="log_loss",
penalty="l2",
eta0=0.001,
alpha=0,
learning_rate="constant",
random_state=1729,
warm_start=True,
max_iter=1

for i in range(5):
clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 8/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab
print(log_loss(y_train,y_pred))

0.2529904609012919
0.20828682141739835
0.19406901833322654
0.18699850891012404
0.18255077295024025
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_stochastic_gradient.py:738: ConvergenceWarning: Maximum number of iter
warnings.warn(
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_stochastic_gradient.py:738: ConvergenceWarning: Maximum number of iter
warnings.warn(
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_stochastic_gradient.py:738: ConvergenceWarning: Maximum number of iter
warnings.warn(
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_stochastic_gradient.py:738: ConvergenceWarning: Maximum number of iter
warnings.warn(
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_stochastic_gradient.py:738: ConvergenceWarning: Maximum number of iter
warnings.warn(

sgd = SGDClassifier(loss='log_loss',learning_rate='constant',random_state=1729)

from sklearn.model_selection import GridSearchCV

params = {
'alpha':[0.0001, 0.0005, 0.001, 0.005],
'eta0' : [0.01, 0.05, 0.1, 0.5]
}

grid = GridSearchCV(estimator= sgd ,param_grid=params)

grid.fit(X_train,y_train)

▸ GridSearchCV
i ?

▸ best_estimator_:
SGDClassifier

▸ SGDClassifier ?

grid.best_params_

{'alpha': 0.0001, 'eta0': 0.01}

sgd = SGDClassifier(learning_rate='constant',
random_state=1729,
loss='log_loss',
alpha=0.0001,
eta0=0.01,
class_weight={0: 0.1, 1: 2})

sgd.fit(X_train,y_train)
y_pred = sgd.predict(X_test)

correct = ((y_test==1)&(y_pred==1)).sum()
print(correct)

from sklearn.svm import SVC

from sklearn.metrics import confusion_matrix
model = SVC( kernel='rbf',
decision_function_shape='ovr',
random_state=1729,
C=1)

model.fit(X_train,y_train)
y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)
print (cm)

[[1142 0]
[ 58 0]]

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(criterion = 'entropy'

splitter = 'random',

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 9/10
8/16/25, 10:40 AM Mock Part1.ipynb - Colab
min_samples_split = 4,
min_impurity_decrease = 0.0001,
random_state = 1729)

( )

▾ DecisionTreeClassifier i ?

DecisionTreeClassifier(criterion='entropy', min_impurity_decrease=0.0001,
min_samples_split=4, random_state=1729,
splitter='random')

model.tree_.max_depth

model.tree_.node_count

515

model.tree_.impurity[1]

np.float64(0.024564134553940277)

from sklearn.ensemble import BaggingClassifier

from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

dt = DecisionTreeClassifier(random_state=1729)
kn = KNeighborsClassifier()
lg = LogisticRegression(random_state=1729)

bagging = BaggingClassifier(estimator=dt,
n_estimators=20,random_state=1729)

bagging.fit(X_train,y_train)
y_pred = bagging.predict(X_test)
print(accuracy_score(y_test,y_pred))

0.9441666666666667

bagging = BaggingClassifier(estimator=kn,
n_estimators=20,random_state=1729)

bagging.fit(X_train,y_train)
y_pred = bagging.predict(X_test)
print(accuracy_score(y_test,y_pred))

0.9508333333333333

bagging = BaggingClassifier(estimator=lg,
n_estimators=20,random_state=1729)

bagging.fit(X_train,y_train)
y_pred = bagging.predict(X_test)
print(accuracy score(y test,y pred))

https://colab.research.google.com/drive/1OZ2k68niyZvdze7apDHdVFh1ZtqloHYE#printMode=true 10/10

Kakuro Cheat Sheet
100% (1)
Kakuro Cheat Sheet
1 page
State Variables For Engineers - DeRusso, Paul M - (Paul Madden) Roy, Rob J - , Author Close, - 1965 - New York, Wiley - 9780471203803 - Anna's Archive
No ratings yet
State Variables For Engineers - DeRusso, Paul M - (Paul Madden) Roy, Rob J - , Author Close, - 1965 - New York, Wiley - 9780471203803 - Anna's Archive
632 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
No ratings yet
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
20 pages
Pandas
No ratings yet
Pandas
4 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
12 pages
Health Risk Prediction
No ratings yet
Health Risk Prediction
80 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Week-01 B
No ratings yet
Week-01 B
4 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Python 2025
No ratings yet
Python 2025
25 pages
Data Perparation Penting
No ratings yet
Data Perparation Penting
12 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Diabetes Dataset Analysis & Prep
No ratings yet
Diabetes Dataset Analysis & Prep
11 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
Brain Stroke Prediction Using ML - Jupyter Notebook
No ratings yet
Brain Stroke Prediction Using ML - Jupyter Notebook
17 pages
Logistic Regression for Heart Disease
No ratings yet
Logistic Regression for Heart Disease
8 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
Turing Data Analysis
No ratings yet
Turing Data Analysis
30 pages
Project 16 Calories Burnt Prediction
No ratings yet
Project 16 Calories Burnt Prediction
10 pages
Diabetes Data Analysis & Outlier Removal
No ratings yet
Diabetes Data Analysis & Outlier Removal
16 pages
Test Questions and Analysis
No ratings yet
Test Questions and Analysis
10 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Assignment On ANOVA
No ratings yet
Assignment On ANOVA
7 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Sleep Disorder 1689050852
No ratings yet
Sleep Disorder 1689050852
41 pages
R Based Project
No ratings yet
R Based Project
24 pages
6034 Logistic Regression
No ratings yet
6034 Logistic Regression
6 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Heart Attack Prediction Model EDA
100% (1)
Heart Attack Prediction Model EDA
24 pages
TZ086gBnQyyw7UmURJWd Problem Statement Bagging
No ratings yet
TZ086gBnQyyw7UmURJWd Problem Statement Bagging
2 pages
Project 3 - Diabetes Prediction - Ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction - Ipynb - Colab
4 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Healthcare-Project-Simplilearn - Week1
No ratings yet
Healthcare-Project-Simplilearn - Week1
6 pages
Decision Tree PBEL With GridSearchCV
No ratings yet
Decision Tree PBEL With GridSearchCV
12 pages
Pima Indian Diabetes Prediction
No ratings yet
Pima Indian Diabetes Prediction
22 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Group 11 Project 2
No ratings yet
Group 11 Project 2
60 pages
Department of Statistics: COURSE STATS 330/762
No ratings yet
Department of Statistics: COURSE STATS 330/762
8 pages
Healthcare Dataset Stroke Data
No ratings yet
Healthcare Dataset Stroke Data
87 pages
Stroke Prediction
No ratings yet
Stroke Prediction
14 pages
Analysis of The 2nd National Nutrition and Health Survey
No ratings yet
Analysis of The 2nd National Nutrition and Health Survey
36 pages
m3125 Practical 3
No ratings yet
m3125 Practical 3
13 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Week 1
No ratings yet
Week 1
16 pages
PDSA
No ratings yet
PDSA
23 pages
Kanish Stores Presentation
No ratings yet
Kanish Stores Presentation
11 pages
See This - Ipynb - Colab
No ratings yet
See This - Ipynb - Colab
12 pages
Finite Element Methods: Unit 2 Guide
100% (1)
Finite Element Methods: Unit 2 Guide
4 pages
4 - en - MIA - O2.3 - Exp Course 6 - Course Material - Part 4 MP
No ratings yet
4 - en - MIA - O2.3 - Exp Course 6 - Course Material - Part 4 MP
46 pages
MATLAB Econometrics Toolbox User S Guide The Mathworks PDF Download
100% (10)
MATLAB Econometrics Toolbox User S Guide The Mathworks PDF Download
56 pages
Calculus & Matrices: Class 12 Tasks
No ratings yet
Calculus & Matrices: Class 12 Tasks
3 pages
Digital - Chapter3.k Map
No ratings yet
Digital - Chapter3.k Map
18 pages
Ca 1
No ratings yet
Ca 1
24 pages
Chapter 2 Lesson 5 7
No ratings yet
Chapter 2 Lesson 5 7
10 pages
Optimization Techniquesot Notes For Bca 4th Sem Based On Purvanchal University PDF
No ratings yet
Optimization Techniquesot Notes For Bca 4th Sem Based On Purvanchal University PDF
33 pages
SIFT Features for Object Recognition
No ratings yet
SIFT Features for Object Recognition
25 pages
cs3235 3 PDF
No ratings yet
cs3235 3 PDF
142 pages
Discrete-Time Modeling of Clock Jitter in Continuous-Time: ΔΣ Modulators
No ratings yet
Discrete-Time Modeling of Clock Jitter in Continuous-Time: ΔΣ Modulators
4 pages
Quiz CH10&11 - Time Series Analysis and Forecasting & Predictive Data Mining - Preethi Chowdary Narra
No ratings yet
Quiz CH10&11 - Time Series Analysis and Forecasting & Predictive Data Mining - Preethi Chowdary Narra
4 pages
Datsci Handbook
No ratings yet
Datsci Handbook
93 pages
Quantum Computing
No ratings yet
Quantum Computing
20 pages
Slides Presentation
No ratings yet
Slides Presentation
106 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
As Level Paper 52
No ratings yet
As Level Paper 52
14 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
1.1. Formulation of LPP: Chapter One: Linear Programming Problem/LPP
No ratings yet
1.1. Formulation of LPP: Chapter One: Linear Programming Problem/LPP
54 pages
Lecture 9
No ratings yet
Lecture 9
30 pages
Power Systems Numerical Methods
No ratings yet
Power Systems Numerical Methods
31 pages
BIDMAS Practice and Solutions
No ratings yet
BIDMAS Practice and Solutions
8 pages
2.02 The Postulates of Quantum Mechanics
No ratings yet
2.02 The Postulates of Quantum Mechanics
4 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
13 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
24 pages
Simulation Modeling: True/False & MCQs
No ratings yet
Simulation Modeling: True/False & MCQs
21 pages
Reading Graphs - White
No ratings yet
Reading Graphs - White
8 pages
PSA Course Plan-2024-25 - New
No ratings yet
PSA Course Plan-2024-25 - New
2 pages