Concept Learning
Learning from examples
nGeneral-to specific ordering of hypotheses
nVersion spaces and candidate elimination
algorithm
nInductive bias
n
Whats Concept Learning?
n
infer the general definition of some
concept, given examples labeled as
members or nonmembers of the
concept.
example: learn the category of car or
bird
concept is often formulated as booleanvalued function
can be formulated as a problem of
searching a hypothesis space
Training Examples for Concept
Enjoy Sport
Concept: days on which my friend Tom enjoys his favourite
water sports
Task: predict the value of Enjoy Sport for an arbitrary day
based on the values of the other attributes
attributes
Sky
Temp Humid Wind
Water Forecast
Enjoy
Sport
Sunny
Sunny
Rainy
Sunny
Warm
Warm
Cold
Warm
Warm
Warm
Warm
Cool
Yes
Yes
No
Yes
Normal Strong
example
High
Strong
High
Strong
High
Strong
Same
Same
Change
Change
Representing Hypothesis
n
Hypothesis h is described as a conjunction of
constraints on attributes
Each constraint can be:
n
n
n
A specific value : e.g. Water=Warm
A dont care value : e.g. Water=?
No value allowed (null hypothesis): e.g. Water=
Example: hypothesis h
Sky
Temp Humid Wind Water Forecast
< Sunny
?
?
Strong ?
Same >
n
Prototypical Concept Learning
Task
Given:
n Instance Space X : Possible days decribed by the
attributes Sky, Temp, Humidity, Wind, Water, Forecast
n Target function c: EnjoySport X {0,1}
Hypothese Space H: conjunction of literals e.g.
< Sunny ? ? Strong ? Same >
n Training examples D : positive and negative examples of
the target function: <x 1,c(x1)>, , <xn,c(xn)>
Determine:
n A hypothesis h in H such that h(x)=c(x) for all x in D.
n
Inductive Learning Hypothesis
n
Any hypothesis found to approximate the
target function well over the training
examples, will also approximate the target
function well over the unobserved examples.
find the hypothesis that best fits the
training data
Number of Instances,
Concepts, Hypotheses
Sky: Sunny, Cloudy, Rainy
n AirTemp: Warm, Cold
n Humidity: Normal, High
n Wind: Strong, Weak
n Water: Warm, Cold
n Forecast: Same, Change
#distinct instances : 3*2*2*2*2*2 = 96
#distinct concepts : 296
#syntactically distinct hypotheses : 5*4*4*4*4*4=5120
#semantically distinct hypotheses : 1+4*3*3*3*3*3=973
n
organize the search to take advantage of the structure of the
hypothesis space to improve running time
General to Specific Ordering
n
Consider two hypotheses:
n h1 =< Sunny,?,?,Strong,?,?>
n h2 =< Sunny,?,?,?,?,?>
Set of instances covered by h 1 and h2:
h2 imposes fewer constraints than h 1 and therefore classifies more
instances x as positive h(x)=1. h 2 is a more general concept.
Definition: Let hj and hk be boolean-valued functions defined over X.
Then hj is more general than or equal to hk (written hj hk) if and
only if
x X : [ (hk(x) = 1) (hj(x) = 1)]
The relation imposes a partial order over the hypothesis space H
that is utilized in many concept learning methods.
Instance, Hypotheses and
more general
Instances
Hypotheses
specific
x1
h3
x2
h2 h1
h2 h3
h1
h2
general
x1=< Sunny,Warm,High,Strong,Cool,Same>
h1=<
Sunny,?,?,Strong,?,?>
h1 is a minimal
specialization
of h 2
x2=< Sunny,Warm,High,Light,Warm,Same>
h2=<
Sunny,?,?,?,?,?>
h2 is a minimal
generalization
of h 1
h3=< Sunny,?,?,?,Cool,?>
Find-S Algorithm
1.
2.
3.
Initialize h to the most specific hypothesis in H
For each positive training instance x
n
For each attribute constraint ai in h
If the constraint ai in h is satisfied by x
then do nothing
else replace ai in h by the next more
general constraint that is satisfied by x
Output hypothesis h
minimal generalization
to cover x
Constraint Generalization
Arrtibute: Sky
(no value)
Sunny
Cloudy
Rainy
?(any value)
Illustration of Find-S
Instances
x3
Hypotheses
h0
x1
x2
x4
specific
h1
h2,3
h4
general
x1=<Sunny,Warm,Normal,Strong,Warm,Same>+ h0=< , , , , , ,>
h1=< Sunny,Warm,Normal,
Strong,Warm,Same>
x2=<Sunny,Warm,High,Strong,Warm,Same>+
h2,3=< Sunny,Warm,?,
x3=<Rainy,Cold,High,Strong,Warm,Change> Strong,Warm,Same>
x4=<Sunny,Warm,High,Strong,Cool,Change> + h4=< Sunny,Warm,?,
Strong,?,?>
Properties of Find-S
n
Hypothesis space described by
conjunctions of attributes
Find-S will output the most specific
hypothesis within H that is consistent
with the positve training examples
The output hypothesis will also be
consistent with the negative examples,
provided the target concept is
contained in H. (why?)
Why Find-S Consistent?
-
+
+
+
-
h is consistent
with D, then
h>s;
Complaints about Find-S
n
Cant tell if the learner has converged to the target
concept, in the sense that it is unable to determine
whether it has found the only hypothesis consistent
with the training examples. (more examples get
better approximation)
Cant tell when training data is inconsistent, as it
ignores negative training examples. (prefer to detect
and tolerate errors or noise)
Why prefer the most specific hypothesis? Why not
the most general, or some other hypothesis? (more
specific less likely coincident)
What if there are multiple maximally specific
hypothesis? (all of them are equally likely)
Version Spaces
n
A hypothesis h is consistent with a set of
training examples D of target concept if and
only if h(x)=c(x) for each training example
<x,c(x)> in D.
Consistent(h,D) := <x,c(x)>D h(x)=c(x)
n The version space, VSH,D , with respect to
hypothesis space H, and training set D, is the
subset of hypotheses from H consistent with
all training examples:
VSH,D = {h H | Consistent(h,D) }
List-Then Eliminate Algorithm
1. VersionSpace a list containing every
hypothesis in H
2. For each training example <x,c(x)>
remove from VersionSpace any
hypothesis that is inconsistent with the
training example h(x) c(x)
3. Output the list of hypotheses in
VersionSpace
inefficient as it does not utilize the structure
of the hypothesis space.
Example Version Space
S:
{<Sunny,Warm,?,Strong,?,?>}
<Sunny,?,?,Strong,?,?>
G:
x1
x2
x3
x4
<Sunny,Warm,?,?,?,?>
<?,Warm,?,Strong,?,?>
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }
=
=
=
=
<Sunny
<Sunny
<Rainy
<Sunny
Warm Normal Strong Warm Same> +
Warm High Strong Warm Same> +
Cold High Strong Warm Change> Warm High Strong Cool Change> +
Representing Version Spaces
n
The general boundary, G, of version space VSH,D
is the set of maximally general hypotheses.
The specific boundary, S, of version space VSH,D
is the set of maximally specific hypotheses.
Every hypothesis of the version space lies between
these boundaries
VSH,D = {h H| ( s S) ( g G) (g h s)
where x y means x is more general or equal than y
Boundaries of Version Space
g
h
s
+
s
+
+
h is consistent
with D
Consistent(s,D)
= FALSE
+
-
Consistent(g,D)
=FALSE
Candidate Elimination
Algorithm
G maximally general hypotheses in H
S maximally specific hypotheses in H
For each training example d=<x,c(x)>
modify G and S so that G and S are consistent
with d
Positive Example:
g(d)=s(d)=0
s
+
+
+
-
remove g
remove s
+
-
Possitive Example:
g(d)=1 and s(d)=0
-
s
+
+
+
-
generalize s
+
-
Possitive Example:
g(d)=s(d)=1
s
+ + +
+
+ +
Negative Example:
g(d-)=s (d-)=1
s
+ - +
+
+ +
remove s
remove g
-
Negative Example:
g(d-)=1 and s(d -)=0
s
+
+
+
-
specialize g
+
-
Negative Example:
g(d-)=s(d-)=0
s
+
+
+
-
+
-
Candidate Elimination
Algorithm
G maximally general hypotheses in H
S maximally specific hypotheses in H
For each training example d=<x,c(x)>
If d is a positive example
Remove from G any hypothesis that is inconsistent with d
For each hypothesis s in S that is not consistent with d
n remove s from S.
n Add to S all minimal generalizations h of s such that
n
n
h consistent with d
Some member of G is more general than h
Remove from S any hypothesis that is more general than
another hypothesis in S
Candidate Elimination
Algorithm
If d is a negative example
Remove from S any hypothesis that is inconsistent with d
For each hypothesis g in G that is not consistent with d
n remove g from G.
n Add to G all minimal specializations h of g such that
n
n
h consistent with d
Some member of S is more specific than h
Remove from G any hypothesis that is less general than
another hypothesis in G
Example Candidate Elimination
S:
G:
{<, , , , , >}
{<?, ?, ?, ?, ?, ?>}
x1 = <Sunny Warm Normal Strong Warm Same> +
S:
{< Sunny Warm Normal Strong Warm Same >}
G:
{<?, ?, ?, ?, ?, ?>}
x2 = <Sunny Warm High Strong Warm Same> +
S:
G:
{< Sunny Warm ? Strong Warm Same >}
{<?, ?, ?, ?, ?, ?>}
Example Candidate Elimination
S:
{< Sunny Warm ? Strong Warm Same >}
G:
{<?, ?, ?, ?, ?, ?>}
x3 = <Rainy Cold High
S:
G:
Strong Warm Change> -
{< Sunny Warm ? Strong Warm Same >}
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, <?,?,?,?,?,Same>}
x4 = <Sunny Warm High
S:
G:
Strong Cool Change> +
{< Sunny Warm ? Strong ? ? >}
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?> }
Remarks on Version Space
and Candidate-Elimination
n
converge to target concept when
n
n
converge to an empty version space when
n
n
n
n
no error in training examples
target concept is in H
inconsistency in training data
target concept cannot be described by hypothesis
representation
what should be the next training example?
how to classify new instances?
Classification of New Data
S:
{<Sunny,Warm,?,Strong,?,?>}
<Sunny,?,?,Strong,?,?>
G:
x5
x6
x7
x8
<Sunny,Warm,?,?,?,?>
<?,Warm,?,Strong,?,?>
{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }
=
=
=
=
<Sunny
<Rainy
<Sunny
<Sunny
Warm Normal Strong Cool Change> + 6/0
Cold Normal Light Warm Same> - 0/6
Warm Normal Light Warm Same> ? 3/3
Cold Normal Strong Warm Same> ? 2/4
Inductive Leap
+ <Sunny Warm Normal Strong Cool Change>
+ <Sunny Warm Normal Light Warm Same>
S : <Sunny Warm Normal ? ? ?>
How can we justify to classify the new example as
+ <Sunny Warm Normal Strong Warm Same>
Bias: We assume that the hypothesis space H contains
the target concept c. In other words that c can be
described by a conjunction of attribute constraints.
Biased Hypothesis Space
n
Our hypothesis space is unable to represent a
simple disjunctive target concept :
(Sky=Sunny) v (Sky=Cloudy)
problem of
expressibility
x1 = <Sunny Warm Normal Strong Cool Change> +
x2 = <Cloudy Warm Normal Strong Cool Change> +
S : { <?, Warm, Normal, Strong, Cool, Change> }
x3 = <Rainy Warm Normal Light Warm Same> S : {}
Unbiased Learner
n
n
n
Idea: Choose H that expresses every teachable
concept, that means H is the set of all possible
subsets of X called the power set P(X)
|X|=96, |P(X)|=296 ~ 1028 distinct concepts
H = disjunctions, conjunctions, negations
n
e.g. <Sunny Warm Normal ? ? ?> v <? ? ? ? ? Change>
H surely contains the target concept.
Unbiased Learner
What are S and G in this case?
Assume positive examples (x 1, x2, x3) and
negative examples (x4, x5)
G : { (x4 v x5) }
S : { (x1 v x2 v x3) }
The only examples that are classified are the training
examples themselves. In other words in order to learn
the target concept one would have to present every single
instance in X as a training example.
Each unobserved instance will be classified positive by
precisely half the hypothesis in VS and negative by the
other half.
problem of generalizability
Futility of Bias-Free Learning
n
A learner that makes no prior assumptions
regarding the identity of the target concept
has no rational basis for classifying any
unseen instances.
No Free Lunch!
Inductive Bias
Consider:
n Concept learning algorithm L
n Instances X, target concept c
n Training examples D c={<x,c(x)>}
n Let L(xi,Dc ) denote the classification assigned to
instance xi by L after training on D c.
Definition:
The inductive bias of L is any minimal set of assertions
B such that for any target concept c and
corresponding training data D c
(xi X)[B Dc xi ] |-- L(xi, D c)
Where A |-- B means that A logically entails B.
Inductive Systems and
Equivalent Deductive Systems
training
examples
new instance
training
examples
new instance
assertion H
contains target
concept
candidate elimination
algorithm
classification of
new instance or
dont know
using hypothesis space H
equivalent deductive system
theorem prover
classification of
new instance or
dont know
Three Learners with Different
Biases
n
Rote learner: Store examples, and classify x if and
only if it matches a previously observed example.
n No inductive bias
Version space candidate elimination algorithm.
n Bias: The hypothesis space contains the target
concept .
Find-S
n Bias: The hypothesis space contains the target
concept and all instances are negative instances
unless the opposite is entailed by its other
knowledge.
Summary
n
n
n
n
n
n
Concept learning as search through H
General-to-specific ordering over H
Version space candidate elimination algorithm
S and G boundaries characterize learner's
uncertainty
Learner can generate useful queries
Inductive leaps possible only if learner is
biased
Inductive learners can be modelled by
equivalent deductive systems