Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
38 views26 pages

Unit 1 1

DDB unit-1

Uploaded by

22wh1a1205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views26 pages

Unit 1 1

DDB unit-1

Uploaded by

22wh1a1205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Machine Learning

UNIT-1
INTRODUCTION
Ever since computers were invented, we have wondered whether they might be made tolearn.
If we could understand how to program them to learn-to improve automatically with
experience-the impact would be dramatic.
 Imagine computers learning from medical records which treatments are most effective
for newdiseases
 Houseslearningfromexperiencetooptimizeenergycostsbasedontheparticularusage
patterns of theiroccupants.
 Personal software assistants learning the evolving interests of their users in order to
highlight especially relevant stories from the online morningnewspaper

A successful understanding of how to make computers learn would open up many new uses
of computers and new levels of competence and customization

Some successful applications of machine learning


 Learning to recognize spokenwords
 Learning to drive an autonomous vehicle
 Learning to classify new astronomicalstructures
 Learning to play world-classbackgammon

Why is Machine Learning Important?

 Some tasks cannot be defined well, except by examples (e.g., recognizingpeople).


 Relationships and correlations can be hidden within large amounts of data. Machine
Learning/Data Mining may be able to find theserelationships.
 Human designers often produce machines that do not work as well as desired in the
environments in which they areused.
 The amount of knowledge available about certain tasks might be too large for explicit
encoding by humans (e.g., medicaldiagnostic).
 Environments change overtime.
 New knowledge about tasks is constantly being discovered by humans. It may be
difficult to continuously re-design systems “byhand”.

1
Machine Learning

WELL-POSED LEARNING PROBLEMS

Definition: A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.

To have a well-defined learning problem, three features needs to be identified:


1. The class oftasks
2. The measure of performance to beimproved
3. The source ofexperience

Examples
1. Checkers game: A computer program that learns to play checkers might improve its
performance as measured by its ability to win at the class of tasks involving playing
checkers games, through experience obtained by playing games againstitself.

Fig: Checker game board


A checkers learning problem:
 Task T: playingcheckers
 Performance measure P: percent of games won againstopponents
 Training experience E: playing practice games againstitself

2. A handwriting recognition learningproblem:


 Task T: recognizing and classifying handwritten words withinimages
 Performance measure P: percent of words correctlyclassified
 Training experience E: a database of handwritten words with given
classifications
3. A robot driving learningproblem:
 Task T: driving on public four-lane highways using visionsensors
 Performance measure P: average distance travelled before an error (as judged
by humanoverseer)
 Training experience E: a sequence of images and steering commands recorded
while observing a humandriver
Machine Learning

DESIGNING A LEARNING SYSTEM

The basic design issues and approaches to machine learning are illustrated by designing a
program to learn to play checkers, with the goal of entering it in the world checkers
tournament
1. Choosing the TrainingExperience
2. Choosing the TargetFunction
3. Choosing a Representation for the TargetFunction
4. Choosing a Function ApproximationAlgorithm
1. Estimating trainingvalues
2. Adjusting theweights
5. The FinalDesign

1. Choosing the TrainingExperience

 The first design choice is to choose the type of training experience from which the
system willlearn.
 The type of training experience available can have a significant impact on successor
failure of thelearner.

There are three attributes which impact on success or failure of the learner

1. Whether the training experience provides direct or indirect feedback regarding the
choices made by the performancesystem.

For example, in checkers game:


In learning to play checkers, the system might learn from direct training examples
consisting of individual checkers board states and the correct move for each.

Indirect training examples consisting of the move sequences and final outcomes of
various games played. The information about the correctness of specific moves earlyin
the game must be inferred indirectly from the fact that the game was eventually won or
lost.

Here the learner faces an additional problem of credit assignment, or determining the
degree to which each move in the sequence deserves credit or blame for the final
outcome. Credit assignment can be a particularly difficult problem because the game
can be lost even when early moves are optimal, if these are followed later by poor
moves.
Hence, learning from direct training feedback is typically easier than learning from
indirect feedback.

3
Machine Learning

2. The degree to which the learner controls the sequence of trainingexamples

For example, in checkers game:


The learner might depends on the teacher to select informative board states and to
provide the correct move for each.

Alternatively, the learner might itself propose board states that it finds particularly
confusing and ask the teacher for the correct move.

Thelearnermayhavecompletecontroloverboththeboardstatesand(indirect)training
classifications,asitdoeswhenitlearnsbyplayingagainstitselfwithnoteacherpresent.

3. How well it represents the distribution of examples over which the final system
performance P must bemeasured

For example, in checkers game:


In checkers learning scenario, the performance metric P is the percent of games the
system wins in the world tournament.

IfitstrainingexperienceEconsistsonlyofgamesplayedagainstitself,thereisadanger that
this training experience might not be fully representative of the distribution of
situations over which it will later betested.
It is necessary to learn from a distribution of examples that is different from those on
which the final system will be evaluated.

2. Choosing the Target Function

The next design choice is to determine exactly what type of knowledge will be learned and
how this will be used by the performance program.

Let’s consider a checkers-playing program that can generate the legal moves from any board
state.
The program needs only to learn how to choose the best move from among these legal moves.
We must learn to choose among the legal moves, the most obvious choice for the type of
information to be learned is a program, or function, that chooses the best move for any given
board state.

1. Let ChooseMove be the target function and the notationis

ChooseMove : B→ M
which indicate that this function accepts as input any board from the set of legal board
states B and produces as output some move from the set of legal moves M.

4
Machine Learning

ChooseMove is a choice for the target function in checkers example, but this function
will turn out to be very difficult to learn given the kind of indirect training experience
available to our system

2. An alternative target function is an evaluation function that assigns a numericalscore


to any given board state
Let the target function V and the notation
V:B →R

which denote that V maps any legal board state from the set B to some real value.
Intend for this target function V to assign higher scores to better board states. If the
system can successfully learn such a target function V, then it can easily use it to select
the best move from any current board position.

Let us define the target value V(b) for an arbitrary board state b in B, as follows:
 If b is a final board state that is won, then V(b) =100
 If b is a final board state that is lost, then V(b) =-100
 If b is a final board state that is drawn, then V(b) =0
 If b is a not a final state in the game, then V(b) = V(b'),

Whereb'isthebestfinalboardstatethatcanbeachievedstartingfrombandplayingoptimally until the


end of thegame

3. Choosing a Representation for the TargetFunction

Let’s choose a simple representation - for any given board state, the function c will be
calculated as a linear combination of the following board features:

 xl: the number of black pieces on theboard


 x2: the number of red pieces on theboard
 x3: the number of black kings on theboard
 x4: the number of red kings on theboard
 x5: the number of black pieces threatened by red (i.e., which can be captured on red's
nextturn)
 x6: the number of red pieces threatened byblack

Thus, learning program will represent as a linear function of the form


Machine Learning

Where,
 w0 through w6 are numerical coefficients, or weights, to be chosen by the learning
algorithm.
 Learned values for the weights w1 through w6 will determine the relative importance
of the various board features in determining the value of theboard
 The weight w0 will provide an additive constant to the boardvalue

4. Choosing a Function ApproximationAlgorithm

In order to learn the target function f we require a set of training examples, each describing a
specific board state b and the training value V train(b) for b.

Each training example is an ordered pair of the form (b, Vtrain(b)).

For instance, the following training example describes a board state b in which black has won
the game (note x2 = 0 indicates that red has no remaining pieces) and for which the target
function value Vtrain(b) is therefore +100.

((x1=3, x2=0, x3=1, x4=0, x5=0, x6=0), +100)

Function Approximation Procedure

1. Derive training examples from the indirect training experience available to thelearner
2. Adjusts the weights wi to best fit these trainingexamples

1. Estimating trainingvalues

A simple approach for estimating training values for intermediate board states is to
assign the training value of V train(b) for any intermediate board state b to be
V̂(Successor(b))

Where ,
 V̂is the learner's current approximation toV
 Successor(b) denotes the next board state following b for which it is again the
program's turn to move

Rule for estimating training values

Vtrain(b) ← V̂ (Successor(b))

6
Machine Learning

2. Adjusting theweights
Specify the learning algorithm for choosing the weights wi to best fit the set of training
examples {(b, Vtrain(b))}
A first step is to define what we mean by the bestfit to the training data.
One common approach is to define the best hypothesis, or set of weights, as that which
minimizes the squared error E between the training values and the values predicted by
the hypothesis.

Several algorithms are known for finding weights of a linear function that minimize E.
One such algorithm is called the least mean squares, or LMS training rule. For each
observed training example it adjusts the weights a small amount in the direction that
reduces the error on this training example

LMS weight update rule :- For each training example (b, Vtrain(b))
Use the current weights to calculate V̂ (b)
For each weight wi, update it as

wi ← wi + ƞ (Vtrain (b) - V̂ (b)) xi

Here ƞ is a small constant (e.g., 0.1) that moderates the size of the weight update.

Working of weight update rule

 When the error (Vtrain(b)- V̂(b)) is zero, no weights arechanged.


 When (Vtrain(b) - V̂(b)) is positive (i.e., when V̂(b) is too low), then each weight
isincreasedinproportiontothevalueofitscorrespondingfeature.Thiswillraise the
value of V̂(b), reducing theerror.
 If the value of some feature xi is zero, then its weight is not altered regardless of
theerror,sothattheonlyweightsupdatedarethosewhosefeaturesactuallyoccur on
the training exampleboard.
Machine Learning

5. The FinalDesign
Thefinaldesignofcheckerslearningsystemcanbedescribedbyfourdistinctprogrammodules that
represent the central components in many learningsystems

1. ThePerformanceSystemisthemodulethatmustsolvethegivenperformancetaskby using
the learned target function(s). It takes an instance of a new problem (new game) as
input and produces a trace of its solution (game history) asoutput.

2. The Critic takes as input the history or trace of the game and produces as output a set
of training examples of the targetfunction

3. The Generalizer takes as input the training examples and produces an output
hypothesis that is its estimate of the target function. It generalizes from the specific
training examples, hypothesizing a general function that covers these examples and
other cases beyond the training examples.

4. The Experiment Generator takes as input the current hypothesis and outputs a new
problem (i.e., initial board state) for the Performance System to explore. Its role is to
picknewpracticeproblemsthatwillmaximizethelearningrateoftheoverallsystem.

The sequence of design choices made for the checkers program is summarized in below figure
Machine Learning

PERSPECTIVES AND ISSUES IN MACHINE LEARNING

Issues in Machine Learning


The field of machine learning, and much of this book, is concerned with answering questions
such as the following
 What algorithms exist for learning general target functions from specifictraining
examples? In what settings will particular algorithms converge to the desired function,
given sufficient training data? Which algorithms perform best for which types of
problems andrepresentations?
 How much training data is sufficient? What general bounds can be found to relate the
confidenceinlearnedhypothesestotheamountoftrainingexperienceandthecharacter of the
learner's hypothesisspace?
Machine Learning

 Whenandhowcanpriorknowledgeheldbythelearnerguidetheprocessofgeneralizing from
examples? Can prior knowledge be helpful even when it is only approximately
correct?
 What is the best strategy for choosing a useful next training experience, and how does
the choice of this strategy alter the complexity of the learningproblem?
 What is the best way to reduce the learning task to one or more functionapproximation
problems?Putanotherway,whatspecificfunctionsshouldthesystemattempttolearn? Can
this process itself beautomated?
 How can the learner automatically alter its representation to improve its ability to
represent and learn the targetfunction?

10
Machine Learning

CONCEPT LEARNING

 Learning involves acquiring general concepts from specific training examples. Example:
People continually learn general concepts or categories such as "bird," "car," "situations in
which I should study more in order to pass the exam,"etc.
 Each such concept can be viewed as describing some subset of objects or events defined
over a largerset
 Alternatively,eachconceptcanbethoughtofasaBoolean-valuedfunctiondefinedoverthis larger
set. (Example: A function defined over all animals, whose value is true for birds and false
for otheranimals).

Definition: Concept learning - Inferring a Boolean-valued function from training examples of


its input and output

A CONCEPT LEARNING TASK

Consider the example task of learning the target concept "Days on which Aldo enjoys
his favorite water sport”

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Table: Positive and negative training examples for the target concept EnjoySport.

The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the
values of its other attributes?

What hypothesis representation is provided to the learner?

 Let’s consider a simple representation in which each hypothesis consists ofa


conjunction of constraints on the instance attributes.
 Let each hypothesis be a vector of six constraints, specifying the values of the six
attributes Sky, AirTemp, Humidity, Wind, Water, andForecast.

11
Machine Learning

For each attribute, the hypothesis will either


 Indicate by a "?' that any value is acceptable for thisattribute,
 Specify a single required value (e.g., Warm) for the attribute,or
 Indicate by a "Φ" that no value isacceptable

If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive
example (h(x) = 1).

The hypothesis that PERSON enjoys his favorite sport only on cold days with high humidity
is represented by the expression
(?, Cold, High, ?, ?, ?)

The most general hypothesis-that every day is a positive example-is represented by


(?, ?, ?, ?, ?, ?)

The most specific possible hypothesis-that no day is a positive example-is represented by


(Φ, Φ, Φ, Φ, Φ, Φ)

Notation

 The set of items over which the concept is defined is called the set of instances, which is
denoted byX.

Example: X is the set of all possible days, each represented by the attributes: Sky, AirTemp,
Humidity, Wind, Water, and Forecast

 The concept or function to be learned is called the target concept, which is denoted by c.
c can be any Boolean valued function defined over the instancesX

c: X→ {O, 1}

Example: The target concept corresponds to the value of the attribute EnjoySport
(i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).

 Instancesforwhichc(x)=1arecalledpositiveexamples,ormembersofthetargetconcept.
 Instances for which c(x) = 0 are called negative examples, or non-members of the target
concept.
 The ordered pair (x, c(x)) to describe the training example consisting of the instance x and
its target concept valuec(x).
 D to denote the set of available trainingexamples

12
Machine Learning

 The symbol H to denote the set of all possible hypotheses that the learner may consider
regarding the identity of the target concept. Each hypothesis h in H represents a Boolean-
valued function defined overX
h: X→{O, 1}

The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.

 Given:
 Instances X: Possible days, each described by theattributes
 Sky (with possible values Sunny, Cloudy, and Rainy),
 AirTemp (with values Warm andCold),
 Humidity (with values Normal and High),
 Wind (with values Strong andWeak),
 Water (with values Warm and Cool),
 Forecast (with values Same and Change).

 Hypotheses H: Each hypothesis is described by a conjunction of constraints on the


attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. The constraints may be
"?" (any value is acceptable), “Φ” (no value is acceptable), or a specificvalue.

 Target concept c: EnjoySport : X → {0,l}


 Training examples D: Positive and negative examples of the targetfunction

 Determine:
 A hypothesis h in H such that h(x) = c(x) for all x inX.

Table: The EnjoySport concept learning task.

The inductive learning hypothesis

Any hypothesis found to approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over other unobserved
examples.

13
Machine Learning

CONCEPT LEARNING AS SEARCH

 Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesisrepresentation.
 The goal of this search is to find the hypothesis that best fits the trainingexamples.

Example:
Consider the instances X and hypotheses H in the EnjoySport learning task. The attribute Sky
has three possible values, and AirTemp, Humidity, Wind, Water, Forecast each have two
possible values, the instance space X contains exactly
3.2.2.2.2.2 = 96 distinct instances
5.4.4.4.4.4 = 5120 syntactically distinct hypotheses within H.

Every hypothesis containing one or more "Φ" symbols represents the empty set of instances;
that is, it classifies every instance as negative.
1 + (4.3.3.3.3.3) = 973. Semantically distinct hypotheses

General-to-Specific Ordering of Hypotheses

Consider the two hypotheses


h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)

 Consider the sets of instances that are classified positive by hl and byh2.
 h2imposesfewerconstraintsontheinstance,itclassifiesmoreinstancesaspositive.So,
anyinstanceclassifiedpositivebyhlwillalsobeclassifiedpositivebyh2.Therefore,h2 is more
general than hl.

Given hypotheses hj and hk, hj is more-general-than or- equal do hk if and only if any instance
that satisfies hk also satisfies hi

Definition:LethjandhkbeBoolean-valuedfunctionsdefinedoverX.Thenhjismoregeneral- than-
or-equal-to hk (written hj ≥ hk) if and onlyif

( xX ) [(hk (x) = 1) → (hj (x) = 1)]

14
Machine Learning

 Inthefigure,theboxontheleftrepresentsthesetXofallinstances,theboxontheright the set H


of allhypotheses.
 EachhypothesiscorrespondstosomesubsetofX-thesubsetofinstancesthatitclassifies
positive.
 The arrows connecting hypotheses represent the more - general -than relation, with the
arrow pointing toward the less generalhypothesis.
 Note the subset of instances characterized by h2 subsumes the subset characterized by
hl , hence h2 is more - general– than h1

FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS

IND-S Algorithm
F

1. Initialize h to the most specific hypothesis inH


2. For each positive training instancex
For each attribute constraint a in h
i
If the constraint a is satisfied by x
i
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesish
Machine Learning

To illustrate this algorithm, assume the learner is given the sequence of training examples
from the EnjoySport task

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport


1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

 The first step of FIND-S is to initialize h to the most specific hypothesis inH
h - (Ø, Ø, Ø, Ø, Ø, Ø)

 Consider the first trainingexample


x1 = <Sunny Warm Normal Strong Warm Same>, +

Observing the first training example, it is clear that hypothesis h is too specific. None
of the "Ø" constraints in h are satisfied by this example, so each is replaced by the next
more general constraint that fits the example
h1 = <Sunny Warm Normal Strong Warm Same>

 Consider the second trainingexample


x2 = <Sunny, Warm, High, Strong, Warm, Same>, +

The second training example forces the algorithm to further generalize h, this time
substituting a "?" in place of any attribute value in h that is not satisfied by the new
example
h2 = <Sunny Warm ? Strong Warm Same>

 Consider the third trainingexample


x3 = <Rainy, Cold, High, Strong, Warm, Change>, -

Upon encountering the third training the algorithm makes no change to h. The FIND-S
algorithm simply ignores every negative example.
h3 = < Sunny Warm ? Strong Warm Same>

 Consider the fourth trainingexample


x4 = <Sunny Warm High Strong Cool Change>, +

The fourth example leads to a further generalization of h


h4 = < Sunny Warm ? Strong ? ? >

16
Machine Learning

The key property of the FIND-S algorithm


 FIND-S is guaranteed to output the most specific hypothesis within H that isconsistent
with the positive trainingexamples
 FIND-Salgorithm’sfinalhypothesiswillalsobeconsistentwiththenegativeexamples
providedthecorrecttargetconceptiscontainedinH,andprovidedthetrainingexamples
arecorrect.

Unanswered by FIND-S

1. Has the learner converged to the correct targetconcept?


2. Why prefer the most specifichypothesis?
3. Are the training examplesconsistent?
4. What if there are several maximally specific consistenthypotheses?
Machine Learning

VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM

The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of the


set of all hypotheses consistent with the training examples

Representation

Definition: consistent- A hypothesis h is consistent with a set of training examples D if and


only if h(x) = c(x) for each example (x, c(x)) in D.

Consistent (h, D)  (x, c(x)D) h(x) = c(x))

Note difference between definitions of consistent and satisfies


 An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is
a positive or negative example of the targetconcept.
 An example x is said to consistent with hypothesis h iff h(x) =c(x)

Definition:versionspace-Theversionspace,denotedVS with respect to hypothesisspace


H, D
H and training examples D, is the subset of hypotheses from H consistent with the training
examples in D
VS {h H | Consistent (h,D)}
H, D

The LIST-THEN-ELIMINATION algorithm

The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all
hypotheses in H and then eliminates any hypothesis found inconsistent with any training
example.

1. VersionSpace c a list containing every hypothesis inH


2. For each training example, (x,c(x))
remove from VersionSpace any hypothesis h for which h(x) ≠ c(x)
3. Output the list of hypotheses inVersionSpace

The LIST-THEN-ELIMINATE Algorithm

 List-Then-Eliminate works in principle, so long as version space isfinite.


 However,sinceitrequiresexhaustiveenumerationofallhypothesesinpracticeitisnot
feasible.

18
Machine Learning

A More Compact Representation for Version Spaces

The version space is represented by its most general and least general members. These
members form general and specific boundary sets that delimit the version space within the
partially ordered hypothesis space.

Definition: The general boundary G, with respect to hypothesis space H and training data D,
is the set of maximally general members of H consistent with D

G {g H | Consistent (g, D)(g' H)[(g' g) Consistent(g', D)]}


g

Definition: The specific boundary S, with respect to hypothesis space H and training data D,
is the set of minimally general (i.e., maximally specific) members of H consistent with D.

S {s H | Consistent (s, D)(s' H)[(s s') Consistent(s', D)]}


g

Theorem: Version Space representation theorem

Theorem: Let X be an arbitrary set of instances and Let H be a set of Boolean-valued


hypotheses defined over X. Let c: X →{O, 1} be an arbitrary target concept defined over X,
andletDbeanarbitrarysetoftrainingexamples{(x,c(x))).ForallX,H,c,andDsuchthatS and G are
welldefined,

VS ={ h H | (s S ) (g G ) ( g h s)}


H,D g g

To Prove:
1. Every h satisfying the right hand side of the above expression is inVS
H, D
2. Every memberofVS satisfies the right-hand side of theexpression
H, D

Sketch of proof:
1. let g, h, s be arbitrary members of G, H, S respectively with g g h gs
 By the definition of S, s must be satisfied by all positive examples in D. Because hg s,
h must also be satisfied by all positive examples in D.
 BythedefinitionofG,gcannotbesatisfiedbyanynegativeexampleinD,andbecause g g h h
cannot be satisfied by any negative example in D. Because h is satisfied by all positive
examples in D and by no negative examples in D, h is consistent with D, and therefore
h is a member of VSH,D.
2. It can be proven by assuming some h in VSH,D,that does not satisfy the right-hand side
of the expression, then showing that this leads to aninconsistency

19
Machine Learning

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space containing all


hypotheses from H that are consistent with an observed sequence of training examples.

Initialize G to the set of maximally general hypotheses in H


Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
• If d is a positiveexample
• Remove from G any hypothesis inconsistent withd
• For each hypothesis s in S that is not consistent withd
• Remove s fromS
• Add to S all minimal generalizations h of s suchthat
• h is consistent with d, and some member of G is more general thanh
• RemovefromSanyhypothesisthatismoregeneralthananotherhypothesisinS

• If d is a negativeexample
• Remove from S any hypothesis inconsistent withd
• For each hypothesis g in G that is not consistent withd
• Remove g fromG
• Add to G all minimal specializations h of g suchthat
• h is consistent with d, and some member of S is more specific thanh
• Remove from G any hypothesis that is less general than another hypothesis inG

CANDIDATE- ELIMINTION algorithm using version spaces

An Illustrative Example

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport


1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

20
Machine Learning

CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of


all hypotheses in H;

Initializing the G boundary set to contain the most general hypothesis in H


G0?, ?, ?, ?, ?,?

Initializing the S boundary set to contain the most specific (least general) hypothesis
S0, , , , , 

 When the first training example is presented, the CANDIDATE-ELIMINTION algorithm
checks the S boundary and finds that it is overly specific and it fails to cover the positive
example.
 The boundary is therefore revised by moving it to the least more general hypothesis that
covers this new example
 No update of the G boundary is needed in response to this training example because
Gocorrectly covers thisexample

 When the second training example is observed, it has a similar effect of generalizing S
further to S2, leaving G again unchanged i.e., G2 = G1 =G0
Machine Learning

 Considerthethirdtrainingexample.ThisnegativeexamplerevealsthattheGboundary of the
version space is overly general, that is, the hypothesis in G incorrectly predicts that
this new example is a positiveexample.
 The hypothesis in the G boundary must therefore be specialized until it correctly
classifies this new negativeexample

Given that there are six attributes that could be specified to specialize G2, why are there
onlythree new hypotheses in G3?
For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of
G2thatcorrectlylabelsthenewexampleasanegativeexample,butitisnotincludedinG3. The
reason this hypothesis is excluded is that it is inconsistent with the previously
encountered positive examples

 Consider the fourth trainingexample.


Machine Learning

 This positive example further generalizes the S boundary of the version space. It also
results in removing one member of the G boundary, because this member fails to
cover the new positiveexample

After processing these four examples, the boundary sets S4 and G4 delimit the version space
of all hypotheses consistent with the set of incrementally observed training examples.
Machine Learning

INDUCTIVE BIAS

The fundamental questions for inductive inference

1. What if the target concept is not contained in the hypothesisspace?


2. Can we avoid this difficulty by using a hypothesis space that includes everypossible
hypothesis?
3. How does the size of this hypothesis space influence the ability of the algorithm to
generalize to unobservedinstances?
4. How does the size of the hypothesis space influence the number of training examples
that must be observed?

These fundamental questions are examined in the context of the CANDIDATE-


ELIMINTION algorithm

A Biased Hypothesis Space

 Suppose the target concept is not contained in the hypothesis space H, then obvious
solution is to enrich the hypothesis space to include every possiblehypothesis.
 Consider the EnjoySport example in which the hypothesis space is restricted toinclude
onlyconjunctionsofattributevalues.Becauseofthisrestriction,thehypothesisspaceis
unable to represent even simple disjunctive target concepts suchas
"Sky = Sunny or Sky = Cloudy."
 The following three training examples of disjunctive hypothesis, the algorithm would
find that there are zero hypotheses in the versionspace

Sunny Warm Normal StrongCoolChange Y


Cloudy Warm Normal StrongCoolChange Y
Rainy Warm Normal StrongCoolChange N

 IfCandidateEliminationalgorithmisapplied,thenitendupwithemptyVersionSpace. After
first two trainingexample
S= ? Warm Normal Strong Cool Change

 This new hypothesis is overly general and it incorrectly covers the third negative
training example! So H does not include the appropriatec.
 In this case, a more expressive hypothesis space isrequired.

24
Machine Learning

An Unbiased Learner

 ThesolutiontotheproblemofassuringthatthetargetconceptisinthehypothesisspaceH is to
provide a hypothesis space capable of representing every teachable concept that is
representing every possible subset of the instancesX.
 The set of all subsets of a set X is called the power set ofX

 In the EnjoySport learning task the size of the instance space X of days described by
the six attributes is 96instances.
 Thus,thereare296distincttargetconceptsthatcouldbedefinedoverthisinstancespace and
learner might be called upon tolearn.
 The conjunctive hypothesis space is able to represent only 973 of these - a biased
hypothesis space indeed

 Let us reformulate the EnjoySport learning task in an unbiased way by defining a new
hypothesis space H' that can represent every subset ofinstances
 The target concept "Sky = Sunny or Sky = Cloudy" could then be describedas

(Sunny, ?, ?, ?, ?, ?) v (Cloudy, ?, ?, ?, ?, ?)

The Futility of Bias-Free Learning

Inductive learning requires some form of prior assumptions, or inductive bias

Definition:
Consider a concept learning algorithm L for the set of instances X.
 Let c be an arbitrary concept defined overX
 LetD = {(x , c(x))} be an arbitrary set of training examples of c.
c
 Let L (x , D ) denote the classification assigned to the instancex by L after training on
i c i
the data D .
c
 TheinductivebiasofLisanyminimalsetofassertionsBsuchthatforanytargetconcept c and
corresponding training examplesD
c

 (xi X ) [(B Dc xi) ├ L (xi, Dc)]

25
Machine Learning

The below figure explains


 Modelling inductive systems by equivalent deductivesystems.
 The input-output behavior of the CANDIDATE-ELIMINATION algorithm using a
hypothesis space H is identical to that of a deductive theorem prover utilizing the
assertion"Hcontainsthetargetconcept."Thisassertionisthereforecalledtheinductive bias
of the CANDIDATE-ELIMINATIONalgorithm.
 Characterizinginductivesystemsbytheirinductivebiasallowsmodellingthembytheir
equivalent deductive systems. This provides a way to compare inductive systems
according to their policies for generalizing beyond the observed trainingdata.

You might also like