Fedal #5
Fedal #5
ALGORITHM:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-
1]) def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?
general_h[x][x] = '?' if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?' print(" steps of Candidate Elimination Algorithm",i+1)
print("Specific_h ",i+1,"\n ")
print(specific_h)
print("general_h ", i+1, "\n ")
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
OUTPUT:
initialization of specific_h and general_h
['Cloudy' 'Cold' 'High' 'Strong' 'Warm'
'Change']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
steps of Candidate Elimination Algorithm 8
Specific_h 8
['?' '?' '?' 'Strong' '?' '?']
general_h 8
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?',
'?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['?' '?' '?' 'Strong' '?' '?']
Final General_h:
[['?', '?', '?', 'Strong', '?', '?']]
Program 2
AIM: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set
for building the decision tree and apply this knowledge to classify a new sample.
ALGORITHM:
Otherwise Begin
A + the attribute from Attributes that best* classifies Examples
The decision attribute for Root + A
For each possible value, vi, of A,
Then below this new branch add a leaf node with label = most common value of
Target_attribute in Examples
End
Return Root
FLOWCHART:
TRAINING DATA SET:
TEST DATASET:
import pandas as
pd import math
import numpy as np
data = pd.read_csv("Dataset/4-
dataset.csv") features = [feat for feat in
data] features.remove("answer")
class Node:
def init (self):
self.children = []
self.value = ""
self.isLeaf =
False self.pred =
""
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return
0.0 else:
p = pos / (pos +
neg) n = neg / (pos
+ neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))
max_gain = 0
max_feat =
""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples,
feature) if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq =
np.unique(examples[max_feat]) #print
("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] ==
u] #print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf =
True newNode.value =
u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
OUTPUT:
Outlook
Rain
Wind
Strong
No
Weak
Yes
Overcast
Yes
Sunny
Humidity
Normal
Yes
High
No
AIM: Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same
using appropriate data sets.
ALGORITHM:
1. Create a feed-forward network with ni inputs, nhidden hidden units, and nout output units.
1. Input the instance 𝑥, to the network and compute the output ou of every unit u in the
network.
2. Propagate the errors backward through the network
3. For each network unit k, calculate its error term δk
4. For each network unit h, calculate its error term δh
5.
Update each network weight wji
FLOWCHART:
1 2 9 92
2 1 5 86
3 3 6 89
TRAINING DATA SET:
NORMALIZE THE INPUT:
PROGRAM CODE:
import numpy as np
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Variable initialization
epoch=5 #Setting training iterations
lr=0.1 #Setting learning rate
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad
OUTPUT:
———–Epoch- 1 Starts———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.81951208]
[0.8007242 ]
[0.82485744]]
———–Epoch- 1 Ends———-
———–Epoch- 2 Starts———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.82033938]
[0.80153634]
[0.82568134]]
———–Epoch- 2 Ends———-
———–Epoch- 3 Starts———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.82115226]
[0.80233463]
[0.82649072]]
———–Epoch- 3 Ends———-
———–Epoch- 4 Starts———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.82195108]
[0.80311943]
[0.82728598]]
———–Epoch- 4 Ends———-
———–Epoch- 5 Starts———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.8227362 ]
[0.80389106]
[0.82806747]]
———–Epoch- 5 Ends———-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.8227362 ]
[0.80389106]
[0.82806747]]
Program 4
AIM: Write a program to implement the naïve Bayesian classifier for a sample training data set stored as
a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
ALGORITHM:
STEP 1: Load the training data set from the CSV file into a list of dictionaries, whereeach dictionary represents
a single instance (row) in the data set and the keys representthe attribute names (columns) and the values
represent the corresponding attribute valuesfor that instance.
STEP 2: Determine the class variable for each instance in the training data set and add itas a new key- value
pair to the corresponding dictionary.
STEP 3: Create a dictionary to store the prior probabilities for each class variable in thetraining data set. The
key-value pairs should be of the form {class_variable:prior_probability}.
STEP 4: For each attribute in the training data set, create a dictionary to store theconditional probabilities for
each attribute value given each class variable. The key-valuepairs should be of the form {(attribute,
attribute_value, class_variable):conditional_probability}.
STEP 5: Compute the prior probabilities for each class variable by counting the numberof instances in the
training data set that belong to each class variable and dividing by thetotal number of instances.
STEP 6: For each attribute in the training data set, compute the conditional probabilitiesfor each attribute
value given each class variable by counting the number of instances inthe training data set that have that
attribute value and belong to each class variable, anddividing by the number of instances that belong to that
class variable.
STEP 7: Load the test data sets from CSV files into lists of dictionaries, following thesame format as the
training data set.
STEP 8: For each instance in each test data set, compute the posterior probability foreach class variable given
the attribute values in that instance, using the Naive Bayesianformula:P(class variable | attribute _values) =
P(class variable) *product(P(attribute_value | class variable) for attribute_value in attribute_values)
STEP 9: Determine the predicted class variable for each instance in each test data set asthe class variable with
the highest posterior probability.
STEP 10: Compare the predicted class variables to the actual class variables in each testdata set to compute
the accuracy of the classifier.
STEP 11: Output the accuracy for each test data set.
FLOWCHART:
TRAINING DATA SET:
Diabeti
Blood Skin
Examples Pregnancies Glucose Insulin BMI c Age Outcome
Pressure Thickness
Pedigre
e
Functio
n
PROGRAM CODE:
def loadcsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def separatebyclass(dataset):
separated = {} #dictionary of classes 1 and 0
#creates a dictionary of classes 1 and 0 where the values are #the
instances belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector) return
separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1) return
math.sqrt(variance)
def summarizebyclass(dataset):
separated = separatebyclass(dataset);
#print(separated)
summaries = {}
for classvalue, instances in separated.items():
#for key,value in dic.items()
#summaries is a dic of tuples(mean,std) for each class value
summaries[classvalue] = summarize(instances) #summarize is used to cal to mean and
std
return summaries
def main():
filename = 'naivedata.csv'
splitratio = 0.67
dataset = loadcsv(filename);
main()
OUTPUT:
AIM: Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision,
and recall for your data set.
ALGORITHM:
LEARN_NAIVE_BAYES_TEXT (Examples, V)
Examples is a set of text documents along with their target values. V is the set of all possible target values.
This function learns the probability terms P(wk |vj,), describing the probability that a randomly drawn word
from a document in class vj will be the English word wk. It also learns the class prior probabilities P(vi).
1. Collect all words, punctuation, and other tokens that occur in Examples Vocabulary + c the set of all distinct
words and other tokens occurring in any text document from Examples
FLOWCHART:
TRAINING DATA SET:
PROGRAM CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer from
sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
msg=pd.read_csv('naivetext.csv',names=['message','label']) print('The
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
OUTPUT:
1 1
2 1
3 1
4 1
5 0
6 0
7 0
8 0
9 0
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
The total number of Training Data: (13,) The total number of Test Data: (5,)
‘fun’, ‘good’, ‘great’, ‘have’, ‘he’, ‘holiday’, ‘house’, ‘is’, ‘like’, ‘love’, ‘my’, ‘not’, ‘of’, ‘place’,
‘restaurant’, ‘sandwich’, ‘sick’, ‘sworn’, ‘these’, ‘this’, ‘tired’, ‘to’, ‘today’, ‘tomorrow’, ‘very’, ‘view’, ‘we’,
‘went’, ‘what’, ‘will’, ‘with’, ‘work’]
Confusion matrix
[[2 1]
[0 2]]
AIM: Write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use Java/Python
ML library classes/API.
ALGORITHM:
Preprocess data: Handle missing values, encode categorical variables, and scale numerical
features as needed to prepare for analysis.
Define nodes: Identify random variables such as age, sex, symptoms, tests, and diagnosis as nodes
in the Bayesian network.
Determine edges: Establish conditional dependencies between nodes, reflecting causal
relationships and influences among variables.
Learn CPDs: Utilize methods like Maximum Likelihood Estimation (MLE) or Bayesian estimation to
learn the Conditional Probability Distributions (CPDs) for each node from the data.
Build DAG: Construct a Directed Acyclic Graph (DAG) representing the Bayesian network
structure, incorporating the learned CPDs.
Perform inference: Use techniques like Variable Elimination to perform inference on new
patient data, calculating posterior probabilities.
Diagnose: Determine the most probable states given the observed evidence to diagnose the
patient's condition.
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator from
pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('heart.csv') heartDisease
= heartDisease.replace('?',np.nan)
print('Sample instances from the dataset are given below')
print(heartDisease.head())
print('\n Attributes and datatypes')
print(heartDisease.dtypes)
model=BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),('cp','heart disease'),
('heartdisease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
OUTPUT:
Program 7
AIM: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering
using kMeans algorithm. Compare the results of these two algorithms and comment on the quality of
clustering. You can add Java/Python ML library classes/API in the program.
ALGORITHM:
2. Repeat step 1
6. Re-compute the centroids using the current cluster memberships until the stopping criterion is met.
1. (Estimation, E): Calculate the expected value E[zij] of each hidden variable zij, assuming that the current
hypothesis h= holds.
2. (Maximization, M): Calculate a new maximum likelihood hypothesis h‟=, assuming the value taken on by
each hidden variable zij is its expected value E[zij] calculated in step 1. Then replace the hypothesis h= by the
new hypothesis h‟= and iterate.
FLOWCHARt
TRAINING DATA SET:
PROGRAM CODE:
X = dataset.iloc[:, :-1]
plt.figure(figsize=(14,7)) colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])
# K-PLOT
model=KMeans(n_clusters=3, random_state=0).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans') plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])
print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_)) print('The
Confusion matrixof K-Mean:\n',metrics.confusion_matrix(y, model.labels_))
# GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=0).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3) plt.title('GMM
Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])
OUTPUT: