100% found this document useful (1 vote)

176 views7 pages

ML - LAB - 7 - Jupyter Notebook

1. The student implements Naive Bayes classification from scratch to classify a food dataset without built-in functions and achieves 70% accuracy. Queries on new data points are also classified. 2. Decision tree classification is implemented on the Iris dataset using Scikit-Learn. Cross-validation accuracy is over 97%. Cost complexity pruning, alpha values, and impurity plots are used to avoid overfitting. Test accuracy remains high after pruning.

Uploaded by

suman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

176 views7 pages

ML - LAB - 7 - Jupyter Notebook

Uploaded by

suman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

In [1]:

#Name : Mudu Suman

#RollNo : 222CD017
#ML LAB 7

1. Implement Naive Bayes Classifier algorithm without using inbuilt functions.

dataset = {'Taste':['Salty','Spicy','Spicy','Spicy','Spicy','Sweet','Salty','Sweet','Spicy','Salty'],

'Temperature':['Hot','Hot','Hot','Cold','Hot','Cold','Cold','Hot','Cold','Hot'],

'Texture':['Soft','Soft','Hard','Hard','Hard','Soft','Soft','Soft','Soft','Hard'],

'Eat':['No','No','Yes','No','Yes','Yes','No','Yes','Yes','Yes']}

In [3]:

dataset = {'Taste':['Salty','Spicy','Spicy','Spicy','Spicy','Sweet','Salty','Sweet','Spicy'

'Temperature':['Hot','Hot','Hot','Cold','Hot','Cold','Cold','Hot','Cold','Hot'],

'Texture':['Soft','Soft','Hard','Hard','Hard','Soft','Soft','Soft','Soft','Hard'],

'Eat':['No','No','Yes','No','Yes','Yes','No','Yes','Yes','Yes']}

localhost:8888/notebooks/Downloads/ML_LAB_7.ipynb 1/7
14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

In [4]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math

def accuracy_score(y_true, y_pred):

""" score = (y_true - y_pred) / len(y_true) """

return round(float(sum(y_pred == y_true))/float(len(y_true)) * 100 ,2)

def pre_processing(df):

""" partioning data into features and target """

X = df.drop([df.columns[-1]], axis = 1)
y = df[df.columns[-1]]

return X, y

class NaiveBayes:

def __init__(self):
self.features = list
self.likelihoods = {}
self.class_priors = {}
self.pred_priors = {}

self.X_train = np.array
self.y_train = np.array
self.train_size = int
self.num_feats = int

def fit(self, X, y):

self.features = list(X.columns)
self.X_train = X
self.y_train = y
self.train_size = X.shape[0]
self.num_feats = X.shape[1]

for feature in self.features:

self.likelihoods[feature] = {}
self.pred_priors[feature] = {}

for feat_val in np.unique(self.X_train[feature]):

self.pred_priors[feature].update({feat_val: 0})

for outcome in np.unique(self.y_train):

self.likelihoods[feature].update({feat_val+'_'+outcome:0})
self.class_priors.update({outcome: 0})

self._calc_class_prior()
self._calc_likelihoods()
self._calc_predictor_prior()

localhost:8888/notebooks/Downloads/ML_LAB_7.ipynb 2/7
14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

def _calc_class_prior(self):

""" P(c) - Prior Class Probability """

for outcome in np.unique(self.y_train):

outcome_count = sum(self.y_train == outcome)
self.class_priors[outcome] = outcome_count / self.train_size

def _calc_likelihoods(self):

""" P(x|c) - Likelihood """

for feature in self.features:

for outcome in np.unique(self.y_train):

outcome_count = sum(self.y_train == outcome)
feat_likelihood = self.X_train[feature][self.y_train[self.y_train == outcom
.index.values.tolist()].value_counts().to_dict(

for feat_val, count in feat_likelihood.items():

self.likelihoods[feature][feat_val + '_' + outcome] = count/outcome_cou

def _calc_predictor_prior(self):

""" P(x) - Evidence """

for feature in self.features:

feat_vals = self.X_train[feature].value_counts().to_dict()

for feat_val, count in feat_vals.items():

self.pred_priors[feature][feat_val] = count/self.train_size

def predict(self, X):

""" Calculates Posterior probability P(c|x) """

results = []
X = np.array(X)

for query in X:
probs_outcome = {}
for outcome in np.unique(self.y_train):
prior = self.class_priors[outcome]
likelihood = 1
evidence = 1

for feat, feat_val in zip(self.features, query):

likelihood *= self.likelihoods[feat][feat_val + '_' + outcome]
evidence *= self.pred_priors[feat][feat_val]

posterior = (likelihood * prior) / (evidence)

probs_outcome[outcome] = posterior

result = max(probs_outcome, key = lambda x: probs_outcome[x])

results.append(result)

return np.array(results)

localhost:8888/notebooks/Downloads/ML_LAB_7.ipynb 3/7
14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

if __name__ == "__main__":

#Weather Dataset
print("\ndataset:")

df = pd.DataFrame(dataset)
#print(df)

#Split fearures and target

X,y = pre_processing(df)

nb_clf = NaiveBayes()
nb_clf.fit(X, y)

print("Train Accuracy: {}".format(accuracy_score(y, nb_clf.predict(X))))

#Query 1:
query = np.array([['Salty','Hot', 'Soft']])
print("Query 1:- {} ---> {}".format(query, nb_clf.predict(query)))

#Query 2:
query = np.array([['Spicy','Hot', 'Soft']])
print("Query 2:- {} ---> {}".format(query, nb_clf.predict(query)))

#Query 3:
query = np.array([['Salty','Hot', 'Hard']])
print("Query 3:- {} ---> {}".format(query, nb_clf.predict(query)))

dataset:

Train Accuracy: 70.0

Query 1:- [['Salty' 'Hot' 'Soft']] ---> ['No']

Query 2:- [['Spicy' 'Hot' 'Soft']] ---> ['Yes']

Query 3:- [['Salty' 'Hot' 'Hard']] ---> ['Yes']

2. Implement Decision tree on IRIS Dataset using SK Learn library functions. Implement methods to
avoid over-fitting of the data.

localhost:8888/notebooks/Downloads/ML_LAB_7.ipynb 4/7
14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

In [5]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import tree
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
X,y=load_iris(return_X_y=True)
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0)
clf=DecisionTreeClassifier(random_state=0)
clf.fit(X_train,y_train)
y_train_predicted=clf.predict(X_train)
y_test_predicted=clf.predict(X_test)
accuracy_score(y_train,y_train_predicted)
accuracy_score(y_test,y_test_predicted)

Out[5]:

0.9736842105263158

In [6]:

plt.figure(figsize=(16,8))
tree.plot_tree(clf)
plt.show()

localhost:8888/notebooks/Downloads/ML_LAB_7.ipynb 5/7
14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

In [7]:

path=clf.cost_complexity_pruning_path(X_train,y_train)
#path variable gives two things ccp_alphas and impurities
ccp_alphas,impurities=path.ccp_alphas,path.impurities
print("ccp alpha wil give list of values :",ccp_alphas)
print("***********************************************************")
print("Impurities in Decision Tree :",impurities)
ccp alpha wil give list of values : [0. 0.00869963 0.01339286 0.0357
1429 0.26539835 0.33279549]

***********************************************************

Impurities in Decision Tree : [0. 0.01739927 0.03079212 0.06650641

0.33190476 0.66470026]

In [8]:

clfs=[] #will store all the models here

for ccp_alpha in ccp_alphas:
clf=DecisionTreeClassifier(random_state=0,ccp_alpha=ccp_alpha)
clf.fit(X_train,y_train)
clfs.append(clf)
print("Last node in Decision tree is {} and ccp_alpha for last node is {}".format(clfs[-1].

Last node in Decision tree is 1 and ccp_alpha for last node is 0.33279549319
7279

In [9]:

train_scores = [clf.score(X_train, y_train) for clf in clfs]

test_scores = [clf.score(X_test, y_test) for clf in clfs]
fig, ax = plt.subplots()
ax.set_xlabel("alpha")
ax.set_ylabel("accuracy")
ax.set_title("Accuracy vs alpha for training and testing sets")
ax.plot(ccp_alphas, train_scores, marker='o', label="train",drawstyle="steps-post")
ax.plot(ccp_alphas, test_scores, marker='o', label="test",drawstyle="steps-post")
ax.legend()
plt.show()

localhost:8888/notebooks/Downloads/ML_LAB_7.ipynb 6/7
14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

In [10]:

clf=DecisionTreeClassifier(random_state=0,ccp_alpha=0.02)
clf.fit(X_train,y_train)
plt.figure(figsize=(12,8))
tree.plot_tree(clf,rounded=True,filled=True)
plt.show()

In [11]:

accuracy_score(y_test,clf.predict(X_test))
Out[11]:

0.9736842105263158

In [ ]:

localhost:8888/notebooks/Downloads/ML_LAB_7.ipynb 7/7

6th - SEM Data Science Notes
No ratings yet
6th - SEM Data Science Notes
46 pages
(Chapman & Hall - CRC Data Science Series) Brandon M. Greenwell - Tree-Based Methods For Statistical Learning in R - A Practical Introduction With Applications in R-CRC Press (2022)
No ratings yet
(Chapman & Hall - CRC Data Science Series) Brandon M. Greenwell - Tree-Based Methods For Statistical Learning in R - A Practical Introduction With Applications in R-CRC Press (2022)
405 pages
Machine Learning Algorithms, Real-World Applications and Research Directions
No ratings yet
Machine Learning Algorithms, Real-World Applications and Research Directions
73 pages
SN Curve - Steel
No ratings yet
SN Curve - Steel
1 page
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
3 pages
INN Hotels Project
No ratings yet
INN Hotels Project
26 pages
B-Tree vs B+ Tree: Key Differences
No ratings yet
B-Tree vs B+ Tree: Key Differences
34 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
41 pages
Unit 2
100% (1)
Unit 2
42 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
DMT Doc Final
No ratings yet
DMT Doc Final
20 pages
Flight Delay Prediction Team3
No ratings yet
Flight Delay Prediction Team3
8 pages
Text To Speech System For Punjabi Using Festival Framework
No ratings yet
Text To Speech System For Punjabi Using Festival Framework
5 pages
System Reliability Calculation Guide
No ratings yet
System Reliability Calculation Guide
5 pages
Solid Works
No ratings yet
Solid Works
65 pages
Data Mining Exam Prep Guide
No ratings yet
Data Mining Exam Prep Guide
4 pages
ML Labs
No ratings yet
ML Labs
46 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
AVL Trees: Balanced Search Trees Guide
No ratings yet
AVL Trees: Balanced Search Trees Guide
92 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Recommendation of Crop, Fertilizers and Crop Disease Detection System
No ratings yet
Recommendation of Crop, Fertilizers and Crop Disease Detection System
6 pages
A Two-Stage Dynamic Sales Forecasting Model For The Fashion Retail
No ratings yet
A Two-Stage Dynamic Sales Forecasting Model For The Fashion Retail
8 pages
1 s2.0 S277294192400111X Main Cids
No ratings yet
1 s2.0 S277294192400111X Main Cids
17 pages
Learning Fuzzy Rules from Trees
No ratings yet
Learning Fuzzy Rules from Trees
12 pages
Decision Trees: CART Explained
No ratings yet
Decision Trees: CART Explained
24 pages
Ma724 - 38
No ratings yet
Ma724 - 38
7 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
Cit 5
No ratings yet
Cit 5
17 pages
COL 774 - Machine Learning - Assignment 5
No ratings yet
COL 774 - Machine Learning - Assignment 5
6 pages
Cart
No ratings yet
Cart
19 pages
ML Interview Ques
No ratings yet
ML Interview Ques
12 pages
Cluster Credit Risk R PDF
No ratings yet
Cluster Credit Risk R PDF
13 pages
ML Framework for Intelligent MIS
No ratings yet
ML Framework for Intelligent MIS
45 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
161 pages
Paper Pengolahan Data
No ratings yet
Paper Pengolahan Data
9 pages
Ma724 - 21-1
No ratings yet
Ma724 - 21-1
3 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
13 pages
Weka
No ratings yet
Weka
15 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Chapter
100% (1)
Chapter
101 pages
Chapter 4 - Linear Regression
100% (2)
Chapter 4 - Linear Regression
25 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
ECG Image Classification with ML
100% (1)
ECG Image Classification with ML
16 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
Hypothesis and Hypothesis Testing
100% (1)
Hypothesis and Hypothesis Testing
59 pages
Linear Regression Models Guide
100% (1)
Linear Regression Models Guide
61 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
MNIST, IMDB, Reuters Neural Networks
100% (1)
MNIST, IMDB, Reuters Neural Networks
35 pages
CCS355 Neural Networks and Deep Learning Lab
No ratings yet
CCS355 Neural Networks and Deep Learning Lab
43 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
PR01
100% (1)
PR01
41 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
ML Classification Metrics Guide
100% (1)
ML Classification Metrics Guide
30 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Book
100% (1)
Book
480 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Dimensionality Reduction Explained
No ratings yet
Dimensionality Reduction Explained
60 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Vinee
100% (1)
Vinee
28 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
82 pages
Computational Learning Theory Guide
No ratings yet
Computational Learning Theory Guide
24 pages
XGBoost R Tutorial: Model Building & Prediction
100% (1)
XGBoost R Tutorial: Model Building & Prediction
10 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Classification
100% (1)
Classification
37 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
Cyberbullying Code
No ratings yet
Cyberbullying Code
6 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Telecom Customer Churn Dataset Analysis
100% (1)
Telecom Customer Churn Dataset Analysis
5 pages
Regression Analysis Essentials
100% (1)
Regression Analysis Essentials
2 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
14 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages

ML - LAB - 7 - Jupyter Notebook

Uploaded by

ML - LAB - 7 - Jupyter Notebook

Uploaded by

14/11/2022, 15:37 ML_LAB_7 - Jupyter Notebook

#Name : Mudu Suman

1. Implement Naive Bayes Classifier algorithm without using inbuilt functions.

def accuracy_score(y_true, y_pred):

""" score = (y_true - y_pred) / len(y_true) """

return round(float(sum(y_pred == y_true))/float(len(y_true)) * 100 ,2)

""" partioning data into features and target """

def fit(self, X, y):

for feature in self.features:

for feat_val in np.unique(self.X_train[feature]):

for outcome in np.unique(self.y_train):

""" P(c) - Prior Class Probability """

for outcome in np.unique(self.y_train):

""" P(x|c) - Likelihood """

for feature in self.features:

for outcome in np.unique(self.y_train):

for feat_val, count in feat_likelihood.items():

""" P(x) - Evidence """

for feature in self.features:

for feat_val, count in feat_vals.items():

def predict(self, X):

""" Calculates Posterior probability P(c|x) """

for feat, feat_val in zip(self.features, query):

posterior = (likelihood * prior) / (evidence)

result = max(probs_outcome, key = lambda x: probs_outcome[x])

#Split fearures and target

print("Train Accuracy: {}".format(accuracy_score(y, nb_clf.predict(X))))

Train Accuracy: 70.0

Query 1:- [['Salty' 'Hot' 'Soft']] ---> ['No']

Query 2:- [['Spicy' 'Hot' 'Soft']] ---> ['Yes']

Query 3:- [['Salty' 'Hot' 'Hard']] ---> ['Yes']

Impurities in Decision Tree : [0. 0.01739927 0.03079212 0.06650641

clfs=[] #will store all the models here

train_scores = [clf.score(X_train, y_train) for clf in clfs]

You might also like