Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views39 pages

Machine Learning Lab

The document outlines a lab manual for implementing various machine learning algorithms, including Candidate Elimination, ID3 Decision Tree, Back Propagation for Neural Networks, and Naive Bayesian Classifier. Each section provides a clear aim, algorithm steps, and Python code for implementation, along with expected outputs and results. The implementations demonstrate successful execution of these algorithms on provided datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views39 pages

Machine Learning Lab

The document outlines a lab manual for implementing various machine learning algorithms, including Candidate Elimination, ID3 Decision Tree, Back Propagation for Neural Networks, and Naive Bayesian Classifier. Each section provides a clear aim, algorithm steps, and Python code for implementation, along with expected outputs and results. The implementations demonstrate successful execution of these algorithms on provided datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

DEPARTMENT OF B.

TECH ARTIFICIAL
INTELLIGENCE AND DATA SCIENCE

AD3461 – MACHINE LEARNING LABORATORY

LAB MANUAL
Ex No: 1 IMPLEMENTATION OF CANDIDATE ELIMINATION ALGORITHM

AIM:

To implement Candidate Elimination Algorithm using python script.

ALGORITHM:

Step 1: Initialize the version space.

● Initialize the most general hypothesis (h_G) to the maximally general hypothesis
(all attributes set to '?').

● Initialize the most specific hypothesis (h_S) to the maximally specific hypothesis
(all attributes set to specific values or 'null' if not possible).
Step 2: Iterate through the training examples.

● For each positive example, update h_G and h_S as follows:

● For each attribute that does not match the positive example, make it
more specific in h_G.

● For each attribute that matches the positive example, make it more
specific in h_S.

● For each negative example, update h_G and h_S as follows:

● For each attribute that does not match the negative example, make it
more specific in h_S.

● For each attribute that matches the negative example, make it more
specific in h_G.
Step 3: Refine the version space.

● Remove any hypothesis from the version space that is more general than
another hypothesis or more specific than another hypothesis.
Step 4: Repeat Steps 2 and 3 until convergence.

● Keep iterating through the training examples and refining the version space
until it becomes consistent, i.e., contains only one specific hypothesis that
correctly classifies all the training examples.

Step 5: Output the final hypothesis.


PROGRAM:

import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")

print(specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]


print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":

for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'

print(" steps of Candidate Elimination Algorithm",i+1)


print("Specific_h ",i+1,"\n ")
print(specific_h)
print("general_h ", i+1, "\n ")
print(general_h)

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]

for i in indices:

general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_hs_final, g_final = learn(concepts, target)print("Final


Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

OUTPUT:

initialization of specific_h and general_h


['Cloudy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
steps of Candidate Elimination Algorithm 8
Specific_h 8
['?' '?' '?' 'Strong' '?' '?']

general_h 8

[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:

['?' '?' '?' 'Strong' '?' '?']

Final General_h:

[['?', '?', '?', 'Strong', '?', '?']]

RESULT:

Thus the implementation candidate - Elimination algorithm has been implemented


successfully
Ex.No: 2 IMPLEMENTATION OF DECISION TREE BASED ID3 ALGORITHM

AIM:

To implement Decision Tree Based ID3 Algorithm using python script.

ALGORITHM:

Step 1: Start the program

Step 2: Load the dataset and organize it into a table, with rows representing instances and
columns representing features. The last column should contain the class labels.

Step 3: Define a function to calculate the entropy of the dataset. Entropy measures the
uncertainty in the dataset based on class distribution.

Step 4: For each feature, calculate the information gain. Information gain measures how
much a feature contributes to reducing the uncertainty in the dataset.
Step 5: Select the feature with the highest information gain as the best feature to split the
dataset.
Step 6: Divide the dataset into subsets based on the values of the best feature found in
Step 4.
Step 7: Repeat Recursively

Step 8: Build the decision tree by assigning the best feature as the splitting criterion at
each internal node and the majority class as the class label for each leaf node.
Step 9: Use the created decision tree to classify new instances by traversing the tree from
the root to the appropriate leaf node based on their feature values.
Step 10: Evaluate the Model

Step 11: Stop the program

PROGRAM:

import pandas as pd
import numpy as np
dataset=pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])
def entropy(target_col):
elements,counts = np.unique(target_col,return_counts = True)
entropy=np.sum([(counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts)) for i in

range(len(elements))])
return entropy
def InfoGain(data,split_attribute_name,target_name="class"):

total_entropy = entropy(data[target_name])
vals,counts= np.unique(data[split_attribute_name],return_counts=True)

Weighted_Entropy=np.sum([counts[i]/np.sum(counts))*entropy(data.where(data[split_attrib
ute_name]==vals[i].dropna()[target_name]) for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
return Information_Gain
def ID3(data,originaldata,features,target_attribute_name="class",
parent_node_class = None):
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
elif len(data)==0:

returnnp.unique(originaldata[target_attribute_name])
[np.argmax(np.uniqe(originaldata[target_attribute_name],return_counts=True)[1])]
elif len(features) ==0:

return parent_node_class
else:
parent_node_classnp.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name],return_counts=True)[1])]
item_values = [InfoGain(data,feature,target_attribute_name) for feature in features]
#Return the information gain values for the features in the dataset
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]

tree = {best_feature:{}}

features = [i for i in features if i != best_feature]


for value in np.unique(data[best_feature]):
value = value

sub_data = data.where(data[best_feature] == value).dropna()


subtree=ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)
tree[best_feature][value] = subtree
return(tree)

tree = ID3(dataset,dataset,dataset.columns[:-1])
print(' \nDisplay Tree\n',tree)
OUTPUT:

Display Tree

{'outlook': {'Overcast': 'Yes', 'Rain': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny':

{'humidity': {'High': 'No', 'Normal': 'Yes'}}}}

RESULT:

Thus the implementation candidate - Elimination algorithm has been implemented successfully
EX NO.3 IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK USING BACK
PROPAGATION ALGORITHM

AIM:

To implement Artificial Neural Network using back Propagation Algorithm using python
script.

ALGORITHM:

Step 1: Inputs X, arrive through the preconnected path.

Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.

Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer

Step 4: Calculate the error in the outputs

Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce
the error.

Step 6: Repeat the process until the desired output is achieved.

PROGRAM:

import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]), dtype=float)

X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100


#Sigmoid Function
def sigmoid (x):

return (1/(1 + np.exp(-x)))


#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

# draws a random range of numbers uniformly of dim x*y


#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output

outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)

hiddengrad = derivatives_sigmoid(hlayer_act)

#how much hidden layer wts contributed to error


d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr

# dotproduct of nextlayererror and currentlayerop


bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

OUTPUT:

Input:

[[ 0.66666667 1. ]

[ 0.33333333 0.55555556]

[ 1. 0.66666667]]

Actual Output:
[[ 0.92]
[ 0.86]

[ 0.89]]
Predicted Output

[[ 0.89559591]

[ 0.88142069]

[ 0.8928407 ]]

RESULT:

Thus the implementation of back propagation algorithm has been done successfully.
EX.NO 4: IMPLEMENTATION OF NAIVE BAYESIAN CLASSIFIER

AIM:
To implement Naïve Bayesian Classifier using python script.

ALGORITHM:

Step 1: Data Pre-processing step

Step 2: Fitting Naive Bayes to the Training set

Step 3: Predicting the test result

Step 4: Test accuracy of the result(Creation of Confusion matrix)

Step 5: Visualizing the test set result.

PROGRAM:

import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)

from sklearn.model_selection import train_test_split


xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)

from sklearn.feature_extraction.text import CountVectorizer


count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)

from sklearn.naive_bayes import MultinomialNB


clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
from sklearn import metrics
print('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
OUTPUT:

The dimensions of the dataset


(18, 2) 0 I love this sandwich
1 This is an amazing place

2 I feel very good about these


beers 3 This is my best work
4 What an awesome view

5 I do not like this


restaurant 6 I am tired of
this stuff
7 I can't deal with this

8 He is my sworn
enemy 9 My boss is
horrible
10 This is an awesome place

11 I do not like the taste of this


juice 12 I love to dance
13 I am sick and tired of this
place 14 What a great holiday
15 That is a bad locality to stay

16 We will have good fun


tomorrow 17 I went to my enemy's
house today Name: message,
dtype: object
01

11

21

31

41

50

60

70
80

90

10 1

11 0

12 1

13 0

14 1

15 0

16 1

17 0

Name: labelnum, dtype:


int64 (5,)
(13,)

(5,)

(13,)

Accuracy metrics

Accuracy of the classifer is


0.8 Confusion matrix
[[3 1]

[0 1]]
Recall and
Precison 1.0
0.5

RESULT:

Thus the implementation of Naive Bayesian Classifier algorithm has been done
successfully.
EX NO 5: IMPLEMENTATION OF NAIVE BAYESIAN CLASSIFIER MODEL TO
CLASSIFY A SET OF DOCUMENTS

AIM:
To implement the Naïve Bayesian Classifier Model to Classify the document set using
python.
ALGORITHM:

Step 1: Input the total Number of Documents from the user.


Step 2: Input the text and class of Each document and split it into
a List.
Step 3: Create a 2D array and append each document list into an
array

Step 4: Using a Set data structure, store all the keywords in a list.
Step 5: Input the text to be classified by the user.

PROGRAM:
import csv
import random
import math

def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]

return dataset

def splitDataset(dataset, splitRatio):


#67% training size
trainSize = int(len(dataset)* splitRatio);
trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for training data

index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes
belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
return sum(numbers)/float(len(numbers))

def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-
1) return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries

def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries

def calculateProbability(x, mean, stdev):


exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateClassProbabilities(summaries, inputVector):


probabilities = {}
for classValue, classSummaries in summaries.items():#class and attribute information
as mean and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use
normal dist
return probabilities

def predict(summaries, inputVector):


probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel

def getPredictions(summaries, testSet):


predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0

def main():
filename = '5data.csv' splitRatio = 0.67
dataset = loadCsv(filename);

trainingSet, testSet = splitDataset(dataset, splitRatio)


print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet);
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))
main()

OUTPUT:
confusion matrix is
as follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics
precision recall f1-score support

0 1.00 1.00 1.00 17


1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11
avg / total 1.00 1.00 1.00 45
RESULT:
Thus the implementation of Naïve Bayesian Classifier model has been done successfully
EX NO 6: CONSTRUCTING A BAYESIAN NETWORK TO DIAGNOSE AN
INFECTION USING WHO DATA SET.

AIM:

To implement a Bayesian Network to diagnose an infection with WHO dataset using


Python script.

PROGRAM:
import numpy as np

import csv

import pandas as pd

from pgmpy.models import BayesianModel

from pgmpy.estimators import MaximumLikelihoodEstimator

from pgmpy.inference import VariableElimination

#read Cleveland Heart Disease data

heartDisease = pd.read_csv('heart.csv')

heartDisease = heartDisease.replace('?',np.nan)

#display the data

print('Few examples from the dataset are given below')

print(heartDisease.head())

#Model Bayesian Network

Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'), ('heartdisease','thalach'),('heartdisease','chol')])

#Learning CPDs using Maximum Likelihood Estimators

print('\n Learning CPD using Maximum likelihood estimators')


model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

# Inferencing with Bayesian Network

print('\n Inferencing with Bayesian Network:') HeartDisease_infer = VariableElimination(model)

#computing the Probability of HeartDisease given Age

print('\n 1. Probability of HeartDisease given Age=30')


q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'age':28}) print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol print('\n 2. Probability of HeartDisease
given cholesterol=100') q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})
print(q['heartdisease'])
EX NO: 7IMPLEMENTATION OF EM ALGORITHM TO CLUSTER A SET OF DATA

AIM:

To implement EM algorithm to cluster a data set using python.

ALGORITHM:

Step 1: Identify the variable in which the set of attributes are specified in the data set

Step 2: Determine the domain of each variable to take from the set of values.

Step 3: Create a directed graph network or node where each node represents the attributes
and edges represents child relationship.

Step 4: Determine the prior and conditional probability for each attribute

Step 5: Perform the inference on the module and determine the marginal probability.

PROGRAM:

import numpy as np

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

from sklearn.mixture import GaussianMixture

import pandas as pd

X=pd.read_csv("kmeansdata.csv")

x1 = X['Distance_Feature'].values

x2 = X['Speeding_Feature'].values

X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)

plt.plot()

plt.xlim([0, 100])

plt.ylim([0, 50])

plt.title('Dataset')

plt.scatter(x1, x2)

plt.show()

#code for EM
gmm = GaussianMixture(n_components=3)

gmm.fit(X)

em_predictions = gmm.predict(X)

print("\nEM predictions")

print(em_predictions)

print("mean:\n",gmm.means_)

print('\n')

print("Covariances\n",gmm.covariances_)

print(X)

plt.title('Exceptation Maximum')

plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)

plt.show()

#code for Kmeans

import matplotlib.pyplot as plt1

kmeans = KMeans(n_clusters=3)

kmeans.fit(X)

print(kmeans.cluster_centers_)

print(kmeans.labels_)

plt.title('KMEANS')

plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')

plt1.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')


OUTPUT:

EM predictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean: [[57.70629058 25.73574491][52.12044022 22.46250453]
[46.4364858 39.43288647]]
Covariances [[[83.51878796 14.926902 ] [14.926902 2.70846907]] [[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938] [29.55835938 18.17157304]]] [[71.24 28. ] [52.53 25. ] [64.54 27. ]
[55.69 22. ] [54.58 25. ] [41.91 10. ] [58.64 20. ] [52.02 8. ] [31.25 34. ] [44.31 19. ] [49.35 40. ]
[58.07 45. ] [44.22 22. ] [55.73 19. ] [46.63 43. ] [52.97 32. ] [46.25 35. ] [51.55 27. ] [57.05 26. ]
[58.45 30. ] [43.42 23. ] [55.68 37. ] [55.15 18. ][[57.74090909 24.27272727] [48.6 38. ] [45.176
16.4 ]]
[0 0 0 0 0 2 0 2 1 2 1 1 2 0 1 1 1 0 0 0 2 1 0]
RESULT:
Thus the EM Algorithm to cluster a data set has been implemented successfully
EX NO 8: IMPLEMENTATION OF K-NEAREST NEIGHBOURALGORITHM TO
CLASSIFY IRIS DATASET

AIM:

To implement the K-Nearest Neighbour Algorithm to classify the Dataset using python

ALGORITHM:

Step 1: Start the Program

Step 2: Importing the Modules.

Step 3: Creating dataset, scikit_learn has a lot of tools for creating synthetic datasets.

Step 4: Visualize the dataset

Step 5: Splitting data into training and testing dataset.

Step 6: Build a KNN classifier object for the implementation.

Step 7: Predictions for the KNN Classifier, then in the test set, we forecast the target
values and compare them to the actual values.

Step 8: Predict Accuracy for both K-values

Step 9: Visualize Predictions

Step 10: Stop the Program.

PROGRAM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

from sklearn import metrics


from sklearn.datasets import load_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
X=df
y=iris['target']
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)

classifier = KNeighborsClassifier(n_neighbors=3).fit(Xtrain, ytrain)

ypred = classifier.predict(Xtest)

i=0
print ("\n ------------------------------------------------------------------------ ")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print (" ------------------------------------------------------------------------ ")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print (" ------------------------------------------------------------------------ ")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print (" ------------------------------------------------------------------------ ")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print (" ------------------------------------------------------------------------ ")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print (" ------------------------------------------------------------------------ ")

OUTPUT:
0 1 2 3

5.1 3.5 1.4 0.2


1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

Original Predicted Label Correct/Wron


Label g
0 2 Correct
1 1 Correct
2 2 Correct
3 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
2 2 Correct
0 0 Correct
0 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
1 1 Correct
1 1 Correct
Confusion Matrix:

[[5 0 0]
[0 5 0]
[0 0 5]]
Classi 昀椀 cation Report:

precision recall f1-score


support 0 1.00 1.00 1.00 5

1 1.00 1.00 1.00 5


2 1.00 1.00 1.00 5
accuracy 1.00 15
macro avg 1.00 1.00 1.00 15
weighted 1.00 1.00 1.00 15
avg
Accuracy of the classifer is 1.00\n

RESULT:

Thus the K-Nearest Neighbour Algorithm to classify the data set using Python has been
implemented successfully.

EX NO 9: IMPLEMENTATION OF NON-PARAMETRICLOCALLY WEIGHTED


REGRESSION ALGORITHM

AIM:
To implement the non-parametric locally weighted regression algorithm using python.

ALGORITHM:

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Step 7:

Step 8:

Step 9:

PROGRAM:

import matplotlib.pyplot as plt


import pandas as pd
import numpy as np

def kernel(point, xmat, k):


m,n = np.shape(xmat)

weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point, xmat, ymat, k):


wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat, ymat, k):

m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points


data = pd.read_csv("/Users/HP/Downloads/10-dataset.csv")
bill = np.array(data.total_bill)
tip = np.array(data.tip)

#preparing and add 1 in bill


mbill = np.mat(bill)
mtip = np.mat(tip)

m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T,mbill.T))

#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')

ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)


plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()

OUTPUT:
RESULT:

Thus the non parametric locally weighted regression algorithm has been implemented
successfully.

You might also like