Exp No: 1 Candidate-Elimination Learning Algorithm
AIM
To implement and demonstrate the candidate-elimination algorithm to output a
description of the set of all hypotheses consistent with the training examples, where a given
set of training data examples stored in a.csv file.
Candidate-Elimination Learning Algorithm
The CANDIDATE-ELIMINTION algorithm computes the version space containing
all hypotheses from H that are consistent with an observed sequence of training examples.
Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
If d is a positive example
Remove from G any hypothesis inconsistent with d
For each hypothesis s in S that is not consistent with d
Remove s from S
Add to S all minimal generalizations h of s such that h is consistent with d, and some
member of G is more general than h.
Remove from S any hypothesis that is more general than another hypothesis in S.
If d is a negative example
Remove from S any hypothesis inconsistent with d
For each hypothesis g in G that is not consistent with d
Remove g from G
Add to G all minimal specializations h of g such that h is consistent with d, and some
member of S is more specific than h.
Remove from G any hypothesis that is less general than another hypothesis in G
To illustrate this algorithm, assume the learner is given the sequence of training examples
from the EnjoySport task.
Sky Airtem Humidit Wind Enjoyspor
p y t
Sunny Warm Normal Strong Yes
Sunny Warm High Strong Yes
Rainy Cold High Strong No
Sunny Warm High Strong Yes
CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of
all hypotheses in H;
When the first training example is presented, the CANDIDATE-ELIMINTION
algorithm checks the S boundary and finds that it is overly specific and it fails to
cover the positive example.
The boundary is therefore revised by moving it to the least more general hypothesis
that covers this new example
No update of the G boundary is needed in response to this training example because
Go correctly covers this example
Step 1
For training example d,
(Sunny ,Warm,Normal,Strong) +
S0 <∅ ,∅ ,∅ , ∅ >
S1 < Sunny,Warm,Normal,Strong >
G0&G1 <?, ?,?,?>
Step 2:
When the second training example is observed, S changed and S2 and G again unchanged
i.e., G0=G1=G2
For training example d,
(Sunny Warm High Strong)+
S1 <Sunny,Warm, Normal, Week>
S2 < Sunny,Warm,?,Strong >
G2 <?, ?,?,?>
Step 3
For training example d,
(Rainy,Cold,High,Strong)-
S2 < Sunny,Warm,?,Strong >
>
S3 < Rainy,Cold,High,Strong >
G3 <Sunny, ?,?,?> , <?, Warm, ?,?>
Step 4
For training example d,
(Sunny ,Warm,High,Strong)+
S3 < Sunny,Warm,?,Strong >
>
S4 < Sunny,Warm,?,Strong >
G4 <Sunny, ?,?,?> , <?, Warm, ?,?>
OUTPUT
PROGRAM
import numpy as np
import pandas as pd
data = pd.read_csv('a.csv')
concepts = np.array(data.iloc[:, 0:-1])
target = np.array(data.iloc[:, -1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h\n", specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print("initialization of general_h\n", general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
print("if instance is Positive")
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
else:
general_h[x][x] = '?'
elif target[i] == "no":
print("if instance is Negative")
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print("step{}".format(i + 1))
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
Result:
Thus, the candidate-elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples, where a given set of training data examples
stored in a.csv file was implemented.