DEPARTMENT OF B.
TECH ARTIFICIAL
INTELLIGENCE AND DATA SCIENCE
AD3461 – MACHINE LEARNING LABORATORY
LAB MANUAL
Ex No: 1 IMPLEMENTATION OF CANDIDATE ELIMINATION ALGORITHM
AIM:
To implement Candidate Elimination Algorithm using python script.
ALGORITHM:
Step 1: Initialize the version space.
● Initialize the most general hypothesis (h_G) to the maximally general hypothesis
(all attributes set to '?').
● Initialize the most specific hypothesis (h_S) to the maximally specific hypothesis
(all attributes set to specific values or 'null' if not possible).
Step 2: Iterate through the training examples.
● For each positive example, update h_G and h_S as follows:
● For each attribute that does not match the positive example, make it
more specific in h_G.
● For each attribute that matches the positive example, make it more
specific in h_S.
● For each negative example, update h_G and h_S as follows:
● For each attribute that does not match the negative example, make it
more specific in h_S.
● For each attribute that matches the negative example, make it more
specific in h_G.
Step 3: Refine the version space.
● Remove any hypothesis from the version space that is more general than
another hypothesis or more specific than another hypothesis.
Step 4: Repeat Steps 2 and 3 until convergence.
● Keep iterating through the training examples and refining the version space
until it becomes consistent, i.e., contains only one specific hypothesis that
correctly classifies all the training examples.
Step 5: Output the final hypothesis.
PROGRAM:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print("Specific_h ",i+1,"\n ")
print(specific_h)
print("general_h ", i+1, "\n ")
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_hs_final, g_final = learn(concepts, target)print("Final
Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
OUTPUT:
initialization of specific_h and general_h
['Cloudy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
steps of Candidate Elimination Algorithm 8
Specific_h 8
['?' '?' '?' 'Strong' '?' '?']
general_h 8
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['?' '?' '?' 'Strong' '?' '?']
Final General_h:
[['?', '?', '?', 'Strong', '?', '?']]
RESULT:
Thus the implementation candidate - Elimination algorithm has been implemented
successfully
Ex.No: 2 IMPLEMENTATION OF DECISION TREE BASED ID3 ALGORITHM
AIM:
To implement Decision Tree Based ID3 Algorithm using python script.
ALGORITHM:
Step 1: Start the program
Step 2: Load the dataset and organize it into a table, with rows representing instances and
columns representing features. The last column should contain the class labels.
Step 3: Define a function to calculate the entropy of the dataset. Entropy measures the
uncertainty in the dataset based on class distribution.
Step 4: For each feature, calculate the information gain. Information gain measures how
much a feature contributes to reducing the uncertainty in the dataset.
Step 5: Select the feature with the highest information gain as the best feature to split the
dataset.
Step 6: Divide the dataset into subsets based on the values of the best feature found in
Step 4.
Step 7: Repeat Recursively
Step 8: Build the decision tree by assigning the best feature as the splitting criterion at
each internal node and the majority class as the class label for each leaf node.
Step 9: Use the created decision tree to classify new instances by traversing the tree from
the root to the appropriate leaf node based on their feature values.
Step 10: Evaluate the Model
Step 11: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
dataset=pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])
def entropy(target_col):
elements,counts = np.unique(target_col,return_counts = True)
entropy=np.sum([(counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts)) for i in
range(len(elements))])
return entropy
def InfoGain(data,split_attribute_name,target_name="class"):
total_entropy = entropy(data[target_name])
vals,counts= np.unique(data[split_attribute_name],return_counts=True)
Weighted_Entropy=np.sum([counts[i]/np.sum(counts))*entropy(data.where(data[split_attrib
ute_name]==vals[i].dropna()[target_name]) for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
return Information_Gain
def ID3(data,originaldata,features,target_attribute_name="class",
parent_node_class = None):
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
elif len(data)==0:
returnnp.unique(originaldata[target_attribute_name])
[np.argmax(np.uniqe(originaldata[target_attribute_name],return_counts=True)[1])]
elif len(features) ==0:
return parent_node_class
else:
parent_node_classnp.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name],return_counts=True)[1])]
item_values = [InfoGain(data,feature,target_attribute_name) for feature in features]
#Return the information gain values for the features in the dataset
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]
tree = {best_feature:{}}
features = [i for i in features if i != best_feature]
for value in np.unique(data[best_feature]):
value = value
sub_data = data.where(data[best_feature] == value).dropna()
subtree=ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)
tree[best_feature][value] = subtree
return(tree)
tree = ID3(dataset,dataset,dataset.columns[:-1])
print(' \nDisplay Tree\n',tree)
OUTPUT:
Display Tree
{'outlook': {'Overcast': 'Yes', 'Rain': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny':
{'humidity': {'High': 'No', 'Normal': 'Yes'}}}}
RESULT:
Thus the implementation candidate - Elimination algorithm has been implemented successfully
EX NO.3 IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK USING BACK
PROPAGATION ALGORITHM
AIM:
To implement Artificial Neural Network using back Propagation Algorithm using python
script.
ALGORITHM:
Step 1: Inputs X, arrive through the preconnected path.
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer
Step 4: Calculate the error in the outputs
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce
the error.
Step 6: Repeat the process until the desired output is achieved.
PROGRAM:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100
#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
# draws a random range of numbers uniformly of dim x*y
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
# dotproduct of nextlayererror and currentlayerop
bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
OUTPUT:
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output
[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
RESULT:
Thus the implementation of back propagation algorithm has been done successfully.
EX.NO 4: IMPLEMENTATION OF NAIVE BAYESIAN CLASSIFIER
AIM:
To implement Naïve Bayesian Classifier using python script.
ALGORITHM:
Step 1: Data Pre-processing step
Step 2: Fitting Naive Bayes to the Training set
Step 3: Predicting the test result
Step 4: Test accuracy of the result(Creation of Confusion matrix)
Step 5: Visualizing the test set result.
PROGRAM:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
from sklearn import metrics
print('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
OUTPUT:
The dimensions of the dataset
(18, 2) 0 I love this sandwich
1 This is an amazing place
2 I feel very good about these
beers 3 This is my best work
4 What an awesome view
5 I do not like this
restaurant 6 I am tired of
this stuff
7 I can't deal with this
8 He is my sworn
enemy 9 My boss is
horrible
10 This is an awesome place
11 I do not like the taste of this
juice 12 I love to dance
13 I am sick and tired of this
place 14 What a great holiday
15 That is a bad locality to stay
16 We will have good fun
tomorrow 17 I went to my enemy's
house today Name: message,
dtype: object
01
11
21
31
41
50
60
70
80
90
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Name: labelnum, dtype:
int64 (5,)
(13,)
(5,)
(13,)
Accuracy metrics
Accuracy of the classifer is
0.8 Confusion matrix
[[3 1]
[0 1]]
Recall and
Precison 1.0
0.5
RESULT:
Thus the implementation of Naive Bayesian Classifier algorithm has been done
successfully.
EX NO 5: IMPLEMENTATION OF NAIVE BAYESIAN CLASSIFIER MODEL TO
CLASSIFY A SET OF DOCUMENTS
AIM:
To implement the Naïve Bayesian Classifier Model to Classify the document set using
python.
ALGORITHM:
Step 1: Input the total Number of Documents from the user.
Step 2: Input the text and class of Each document and split it into
a List.
Step 3: Create a 2D array and append each document list into an
array
Step 4: Using a Set data structure, store all the keywords in a list.
Step 5: Input the text to be classified by the user.
PROGRAM:
import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
#67% training size
trainSize = int(len(dataset)* splitRatio);
trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes
belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-
1) return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():#class and attribute information
as mean and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use
normal dist
return probabilities
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
filename = '5data.csv' splitRatio = 0.67
dataset = loadCsv(filename);
trainingSet, testSet = splitDataset(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet);
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))
main()
OUTPUT:
confusion matrix is
as follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics
precision recall f1-score support
0 1.00 1.00 1.00 17
1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11
avg / total 1.00 1.00 1.00 45
RESULT:
Thus the implementation of Naïve Bayesian Classifier model has been done successfully
EX NO 6: CONSTRUCTING A BAYESIAN NETWORK TO DIAGNOSE AN
INFECTION USING WHO DATA SET.
AIM:
To implement a Bayesian Network to diagnose an infection with WHO dataset using
Python script.
PROGRAM:
import numpy as np
import csv
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data
print('Few examples from the dataset are given below')
print(heartDisease.head())
#Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'), ('heartdisease','thalach'),('heartdisease','chol')])
#Learning CPDs using Maximum Likelihood Estimators
print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:') HeartDisease_infer = VariableElimination(model)
#computing the Probability of HeartDisease given Age
print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'age':28}) print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol print('\n 2. Probability of HeartDisease
given cholesterol=100') q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})
print(q['heartdisease'])
EX NO: 7IMPLEMENTATION OF EM ALGORITHM TO CLUSTER A SET OF DATA
AIM:
To implement EM algorithm to cluster a data set using python.
ALGORITHM:
Step 1: Identify the variable in which the set of attributes are specified in the data set
Step 2: Determine the domain of each variable to take from the set of values.
Step 3: Create a directed graph network or node where each node represents the attributes
and edges represents child relationship.
Step 4: Determine the prior and conditional probability for each attribute
Step 5: Perform the inference on the module and determine the marginal probability.
PROGRAM:
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()
#code for Kmeans
import matplotlib.pyplot as plt1
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
OUTPUT:
EM predictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean: [[57.70629058 25.73574491][52.12044022 22.46250453]
[46.4364858 39.43288647]]
Covariances [[[83.51878796 14.926902 ] [14.926902 2.70846907]] [[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938] [29.55835938 18.17157304]]] [[71.24 28. ] [52.53 25. ] [64.54 27. ]
[55.69 22. ] [54.58 25. ] [41.91 10. ] [58.64 20. ] [52.02 8. ] [31.25 34. ] [44.31 19. ] [49.35 40. ]
[58.07 45. ] [44.22 22. ] [55.73 19. ] [46.63 43. ] [52.97 32. ] [46.25 35. ] [51.55 27. ] [57.05 26. ]
[58.45 30. ] [43.42 23. ] [55.68 37. ] [55.15 18. ][[57.74090909 24.27272727] [48.6 38. ] [45.176
16.4 ]]
[0 0 0 0 0 2 0 2 1 2 1 1 2 0 1 1 1 0 0 0 2 1 0]
RESULT:
Thus the EM Algorithm to cluster a data set has been implemented successfully
EX NO 8: IMPLEMENTATION OF K-NEAREST NEIGHBOURALGORITHM TO
CLASSIFY IRIS DATASET
AIM:
To implement the K-Nearest Neighbour Algorithm to classify the Dataset using python
ALGORITHM:
Step 1: Start the Program
Step 2: Importing the Modules.
Step 3: Creating dataset, scikit_learn has a lot of tools for creating synthetic datasets.
Step 4: Visualize the dataset
Step 5: Splitting data into training and testing dataset.
Step 6: Build a KNN classifier object for the implementation.
Step 7: Predictions for the KNN Classifier, then in the test set, we forecast the target
values and compare them to the actual values.
Step 8: Predict Accuracy for both K-values
Step 9: Visualize Predictions
Step 10: Stop the Program.
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.datasets import load_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
X=df
y=iris['target']
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=3).fit(Xtrain, ytrain)
ypred = classifier.predict(Xtest)
i=0
print ("\n ------------------------------------------------------------------------ ")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print (" ------------------------------------------------------------------------ ")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print (" ------------------------------------------------------------------------ ")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print (" ------------------------------------------------------------------------ ")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print (" ------------------------------------------------------------------------ ")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print (" ------------------------------------------------------------------------ ")
OUTPUT:
0 1 2 3
5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Original Predicted Label Correct/Wron
Label g
0 2 Correct
1 1 Correct
2 2 Correct
3 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
2 2 Correct
0 0 Correct
0 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
1 1 Correct
1 1 Correct
Confusion Matrix:
[[5 0 0]
[0 5 0]
[0 0 5]]
Classi 昀椀 cation Report:
precision recall f1-score
support 0 1.00 1.00 1.00 5
1 1.00 1.00 1.00 5
2 1.00 1.00 1.00 5
accuracy 1.00 15
macro avg 1.00 1.00 1.00 15
weighted 1.00 1.00 1.00 15
avg
Accuracy of the classifer is 1.00\n
RESULT:
Thus the K-Nearest Neighbour Algorithm to classify the data set using Python has been
implemented successfully.
EX NO 9: IMPLEMENTATION OF NON-PARAMETRICLOCALLY WEIGHTED
REGRESSION ALGORITHM
AIM:
To implement the non-parametric locally weighted regression algorithm using python.
ALGORITHM:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Step 7:
Step 8:
Step 9:
PROGRAM:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def kernel(point, xmat, k):
m,n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights
def localWeight(point, xmat, ymat, k):
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
def localWeightRegression(xmat, ymat, k):
m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
# load data points
data = pd.read_csv("/Users/HP/Downloads/10-dataset.csv")
bill = np.array(data.total_bill)
tip = np.array(data.tip)
#preparing and add 1 in bill
mbill = np.mat(bill)
mtip = np.mat(tip)
m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T,mbill.T))
#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()
OUTPUT:
RESULT:
Thus the non parametric locally weighted regression algorithm has been implemented
successfully.