UNIT Introduction, Concept Learning and The
: (General-to-Specific Ordering, Decision Tree
. Learning
Syllabus
Introduction : Well-posed learning problems, designing a learning system, Perspectives and issues in
machine learning.
Concept learning and the general to specific ordering ~ introduction, d concept learning task, concept
learning as search, find-S : finding @ maximally specific hypothesis, version spaces and the candidate
elimination algorithm, remarks on version spaces and candidate elimination, inductive bias.
Decision Tree Learning — Introduction, decision tree representation, appropriate problems for decision
tree learning, the basic decision tree learning algorithm, hypothesis space search tin decision tree
learning, inductive bias in decision tree learning, issues in decision tree learning.
LEARNING OBJECTIVES
> Well posed learning problems
> Designing a learning system
> Concept learning task
> Finds 3
> Version spaces and candidate elimination algorithm
> Inductive bias,
> Decision tree learning and its representation
w
Hypothesis space search and induction bias in decision tree learning
INTRODUCTION
The artificial intelligent systems have the learning capability as humans have. But, the learning capability of Al
systems is not the same as that of human learning capability ie,, the human capability of learning is higher than
the Al systems. The Al systems postess some sort of mechanical learning capabilities, which are referred to as
‘machine learning’. Various methods of machine learning are available. Some of them are, inductive learning,
Artificial Neural Networks (ANN) and genetic algorithms,
‘Molor part of learning involves obtaining of general concepts from particular training examples. For example, people
will learn the general concepts or categories lke "bird, "bike" etc continuously. Every concept can be viewed os
description of some subset of elther objects or events which are defined over larger set, As an alternative every
concept can be considered a1 boolean-valued function which Is defined over larger set.
The problem of automatically Inferring general defrition of concept, glven examples named ax members and
nonmembers of concept. This task called as concept learning or approximating the boolean-valued function from
examples of Input ond output.
Decision free learning is process of resembling the licrete valved target functions where the decslon tree represents
the learned function. It ls a widely uied and practical method for Inductive Inference. This method searches for a
‘completely expressive hypothesis space and then removes the dlffcuties of the restricted hypothasls spaces,
2) 0) SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS,
Scanned with CamScannerNST
‘PART-A’ sHoRT QUESTIONS WITH SO!
ar,
Answer :
Define machine learning.
Mode! Papert, atte)
The arti I intelligent systems have the learning
Sapability as humans have, But, the leaning ility of AL
SEGRE Bo the same a that of human Tearing pai
sya x uman capability of learning is higher than the AL
FNEMs. The AL systems possess some sort of mechanical
Taming capabilities, Which ate referred to as ‘machine
Sonn’, Various methods of machine Jeaming are available.
Nemeth Artificial Neural
are, induetive learni
Networks (ANN) and genetic algorithm:
Q2. List the basic design issues to machine
learning.
Answer : ,
Doe-19, a1(2)
There are several issues in machine learning which are
depicted as follows,
What are the algorithms which are available for learn-
ing the general target functions from particular
examples? In which settings, the specifica
converge to desired function provided sufficient train-
ing data? Which type of algorithms perform the best for
Which types of problems and representations?
How much training is required? What type of bounds
can be found related to the vonfidence in leamed hy-
pothesis to amount of training experience and character
of leamer’s hypothesis space?
3. How and when the prior knowledge of learner will guide
the generalization ffom examples? Can the knowledge
be usefual when itis correct approximately?”
4. What is the best strategy for selecting useful training
experience? How the choice of strategy alters learning,
problem complexity?
5. Inwhich way the learning task can be reduced to one or
more function approximation problems? What are the
specific functions that need to be learnt by the system?
Can this process be automated?
6. Howcanthe learner automatically modify the represen-
tation to improve the representation and learn the target
function?
3. Write any two applications of machine learning.
Answer
‘The two applications of machine learning are given as
follows,
() Web Search
“The machine learning is used to rank the web-page
depending upon the priority oF linkeness ofthe user
WARNING: xeroxPnotecopyi@ ibis Books a CRIMINAL aA ound pli is UABLE fo face LEGAL proceed
ings.
ee é,
LEARNING [JNTU-HYDERABAD
MACHINE
LUTIONS
i) Finance
° ning is used in decision raking in
ds to different persons. It also
xf to it and deciding where
‘The machine lea
ssending the credit €at
connecte
evaluates the is
‘and how to inverts money.
Q4. Whats conéept learning? = ee
¥ . Model Pper2, Qi)
Answer ®
learning involves obt
Ma ee aa anining examples. For example, people
aera concepts or categories like “bird”, “bike” ete
tinuously, Every eoncept can be viewed as description of
sacra of ether objets o events which are defined over
Targer set_As an alternative every concept can be considered as
boolean-valued function which is defined over larger set.
taining of general.
concept
will the general concey
‘The problem of automatically inferring general defi-
nition of a concept, given examples named as members and
nonmembers of concept. This task called és concept learning
for approximating the boolean-valued function from examples
of input and output.
Q5. State version space representation theorem.
Answer: * Dec.-19, a11b)
‘Theorem
Consider X as arbitrary set of instances and Hi as set
of boolean valued hypothesis defined over X. Consider C : X
+ {0, 1} as arbitrary target concept defined over X and Das
arbitrary set of training examples {
). For all X, H, C
and D such that § and G are well defines),
VSy.9" ((heH | 38eS) GgeG) (g>, h>,S)}
Proof 3
“The theorem is prooied by suffcing it to represent that,
-Everh that satisfies the right side of expression is avail- /
abiein VS... 2
2, Every member of VS,, , will satisfy the right hand side
‘of expression,
___Toillustrate (1) assume g as arbitrary member of G, Sas
srbitrary membir of S and has arbitrary member of H such that
82, h2, S. According to the definition of S the s must be sais-
fed by all positive examples in D. The h must be satisfied by all
Positive examples of D since hs. According to the definition
0F6, the g cannot be satisfied by negative example in D. The -
his consistent with D and h is member of VS, because his
satisfied by all positive examples in D and no negative examples
in D. The argument for (2) is complex. It can be Proved by a5-
‘suming some h in VS,,,, which cannot satisfy right-hand side
oF expression. Repesciting this would lead o inconsisteney
Scanned with CamScannerUNIT-1_ Inlroduction, Concept Lenrning and Tho Gonoral to. Specific Ordering, Decision Tree Learning
Q6. What Is decision trae?
Answer 1 Mode Papaet.0104)
Dec
twee is 0 tool for d
i ion supporting, which
takes the form of tree structure making: possible decisions
by performing the set of tests, This tre usually takes a set of
attributes in the form of input and genetates the output value
fpased on the input values. Fach inter
test that is performed on one of the properties. Furthermore,
branches are assigned with certain values of the test and leaf
nodes represents the values that must be returned after reaching
of that particular leaf,
Q7. What are the typos of decision trees?
Answer t
| node represents the
Decision trees are mainly classi
() Classification Trees
1d into two types,
It refers a type of tree where the decision or outcome
variable is categorical,
(i) Regression Trees
Itrefers toa type of tree wherein the decision or outcome
QB. What Is over fitting?
Answer +
Modet Paper2, 116)
The decision trees will be at high risk to overfit the train-
ing data to a high degree when they are not pruned.
100% ,
st
Aceuracy|
Test accuracy
“Tree depth
‘Fiqure: Relationship between Tres Repth gnd Over Fitting
Here, over fitting might occurs when models select the
noise or errors in training data set. Therefore over fitting will be
seen viewed a8 performance gap between the training and test
data. For example, assume a hypothesis space H, a hypothesis
« His said to overfit the training data if there is any alternative
hypothesis # JY such that h has smaller error than Hover train-
ing examples, But M bias small error rate than h over distribution
of instance.
‘Ageneral method to reduce overfitting in decision trees
és decision tree pruning. There arc two methods namely post-
Phining and pre-pruning.
1.2
9, What is pruning?
Answer
Pruning is a technique that is used to reduce the size of
decision trees by removing parts of the tree which provide less
prwer for classifying the instances, It reduces the complexity
of final classifier and even improves the predictive accuracy by
reduction of over fitting.
Pruning can occurs either in top-down form or bottom-up
fashion, The top-down pruning traverses the nodes and trims
the subtree starting at root. The bottom-up pruning begins at
leaf nodes,
Pruning when applied on decision trees will remove one
‘or more subtrees from it. There are various methods for decision
tree pruning. They replace the subtree with leaf if classification
accuracy is not reduced over pruning data set. Pruning increases
number of classification errors on training set but improves
clas accuracy on unseen data.
Q10. Define inductive bias.
Answer?
Inductive bias can be defined as a set of assumptions
which can justify the classifications that are assigned to future
instances. This is done along with training data. For a set of
examples there might be a number of trees which are
compatible with these examples’ Inductive bias can be explained
based on its selection among the compatible hypothesis. The first
tree is selected that is found while hill-climbing and simple-to-
complex search on possible trees. Then the ID3 search process
will select for the benefit of shorter trees that can accommodate
attributes which have highest information gain adjacent to root.
luctive bias of ID3 is complex to expose.
SPECTRUM @LLAN-QNE JOURNAL FOR ENGINEERING STUDENTS © eset
Scanned with CamScannerEL
ARNING [JNTU-HYDERABAD)
MACHINE LE
eT
PART-B> Essay QUEST!
11 ntRopuction
iS
VE Wen Posed Learning Problems,
Designing a Learning System
(G8) ascinnns have tate tanuonco on ms
= chine learning? Explain with examples.
Answer:
ee.-19, 02
Disciptines That tnfluence Machine Learalng
1
Artificial Intelligence
W involves te: e
&s learning of symbolic representations of
TeaRgDtS: Wt drives the machine teaming as a search problem.
i makes use of leaming as a method of improving problem
“dures to control the processes to opti=
‘es and which learn to predict next state
3. Philosophy
Asimplest hypothesis is considered as best according to
ims razor. Analysis of justification for generalizing beyond
observed data eee 7
4. Statistics
___ involves charaterization of errs tha eur whilees
‘imating hypothesis accuracy depending upon lined sample
of data.
5. Bayesian Methods
The bayes theorem is used as basis to compute prob-
abilities of hypothesis in machine learning.
6. Information Theory
It involves measuring of entropy and information con-
tent. It also involves minimum description length approaches
to learning and optimal codes and their relationship to optimal
training sequences to encode a hypothesis.
7. Computational Complexity Theory
It involves theoretical bounds on inherent complexity
of various learning tasks, measured in terms of computational
effort, number of mistakes, number of training examples ete.
nced to be learned,
8. Psychology and Neuroblology
It involves power law of practice that states tht over a
broad range of learning problems, the peoples response time
improvises along with practice according to power law,
- WARNING: Xerox/Phatocopying ‘thls bbok va CRIMINAL wet Ren ting ity 6 LIAL to fa
ONS WITH SOLU’ .
d learning problems in,
taro the woll pose’
echine foarning? Explain.
Model Papert, a2)
‘Anawer #
‘A computer progr
sponding to some class of
ns from_experience E corre.
st eee ‘or instance, consider a coms
am lear
ftasks T
"be
puter program which lea
pit performance ’
vrs wh sheckers games. Generally to
cee der anvolve playing checkers ga
a a crming problem, the features need to be
yeasure of performance
identified. They are the class of tasks, m
to be approved and source of exper
ACheckers Learning Problem
“Task T: Playing checkers
Performance measure P: Percent of games won against
opponents
Tr
itself
ng experience E: Playing practice games against
i in suchway
Many learning problems can be specified in suchway
like learning to recognize handwritten words or learning how
to drive a robotic automobile autonomously.
A Handwriting Recognition Learning Problem
‘Task T: Recognizing and classifying handwritten words
with images
Performance Measure P: Percent of words classified
correctly
‘Training Experience E: A database of handwritten words
with given classifications :
‘A Robot Driving Learning Problem
Task T: Driving on public four lane highways by using
the vision sensors ™
Performance Measure P: Average distance traveled
before an error i
. ‘Training Experience E: A sequence of images aid stter=
ing commands recorded while observing human driver. |”
The definition of learning is capable of including the
tasks which are called as *leaming" tasks nae ob
166 LEGAL proceedings.
Scanned with CamScannerUNIT-1 "Introduction, Concept Leaming and The General-to-Specific Ordering, Decision Tree Learning
Q13. Illustrate the basic design issues,
Answer
The haste design issues and approaches to machine
earning can be determined by considering a program design
to lear playing checkers. The purpose of itis to make it enter
tg world checkers tguemament
magni enlesining system
ining answer refer Unit-l, 4, 5, 67
14" Write now training experience Is selecte
Answer:
1.8
The Cheekers Learning Problem
‘Task T: Playing checkers
Performance Measure P: Percent of games won in world
tournament
“Training Experience E: The games played against itsel
‘To complete the design of learning system, one among
the following must be chosen,
1. The exaet type of knowledge that is to be learned
2. Representation for the target knowJedge. ‘
_-The learning mechanism,
‘The success and failure of a leamer is completely depen’ 5
dent upon the type oftraining experience available. An important. Is. How a target function and its representation is
aspect is whether training experience offers direct or indirect,
feedback related to the choices of performance system. While
to play checkers, the system may learn from direct
raining examples containing separate checker board states and
their moves. AS an alternative, it might be available indirect
data of move sequences and final result of different games, In
this case the data related to correctness of moves carly in game
need to be inferred indirectly from the fact that game is either
Jost or wort \Learner can face another problem related to credit
assignmentor determining the degree of moves in sequence
This degree deserves a credit or blame for final result. The credit
assignment is a complex problem since the game might be lost
‘even ifearly moves are optimal when poor moves follows them.
Leaming from indirect feedback is complex than learning from
direct training feedback, Another aspect of training experience is
the degree to which te learner ean confot sequence of training
ramples. The learner might depend upon teacher for selecting
the informative board states and even to offer a correct move
Jas an alternative, the leamer can propose
for each of them]
board states that if is confusion and then it requests teacher
for correct move. The learner can have control on board states
and training classification because this is done when it leams
playing against it self wit no present teacher. The earer ean
tcleteither experimenting with novel board states that it isnot
tonsidered ot honoring the skills by playing minor differences
of lines of play which are promising.
(te third aspect of training experience is the represen-
“tation bf examples distibution-aver which the fil system
performance pis measured. The learning is found tobe reliable
Phil the training examples are following a similar distribution
s- In checkers learning, the performance
to future test example:
fon by system in world
retric p denotes the percent of games w
‘ounaneny>
{Ifthe training experience contains the
played against itself, then there is 0 danger thatthe taining
Experience might not be able represent the distribition of site
ations upon which itis tested. For example, the learer may not
‘encounter the crucial board tates which can be played by human
Checkers champion, It is mandatory to leam from examples,
distribution whichis different from those on which final system
evaluates, These type of situations pose to be problematic since
mnastery of a examples distribution many not lead fo stron,
performance over the other distributions.
(SPECTRUM ALL
‘games which are
IN-ONE JOURNAL FOR ENGINEERING STUDENTS -
selected? Explain.
Answer
Selecting a'Target Function
Consider a checkers playing program that is capable
of producing legal moves from any of the board state. The
program requites o lear the selection of best move from the
Tegal moves Such type of Tearing tsk represents large set of
tasks for which the legal moves (defining large search space)
are called as priori, But best strategy’ is not known for them.
‘This eategory includes different types of optimization problems
like scheduling and controlling the manufacturing processes
where the available manufacturing steps are understood well
In this setting it is mandatory to learn about selection among,
legal moves. One necessary, choice for information type to be
learned is a pr oF function which selects best move for
given board statepAssume it as Choose Move and use Choose
Move: B— M }o¢representing thatthe function accepts as input
for any board from legal board states B. It generates a move
from legal moves M. >
Even though ChooseMove is a mandatory choice for
target function, its difficult to lear the incase of indict ex-
perience available tothe system. Another target function is an
{valuation function which is easier to learn and which assigns
rumerical score to given board state. Let the target function be
denoted by V and use the notation V : B —> Ato represent that
‘V inaps legal board state from set B to a read value. Let this
tariet function V assign higher scores to better board states. If
the system can learn a target function V then it can use it for
selecting the best move from present board postion,
tis can be implemented by producing the successor
board state that is generated by legal move, further more by
using V to select best successor state and best legal move, The
Value of target function V fot any given board state can be any
evaluation function which assigns higher scores to better boa
slates. Define the target value V(b) for arbiteary, board state b
inB,
‘ site att
4 1fbis inal boar state which is won, then V(b) 100.
4 Ibis final board state which is lost, thet V(b) =~ 100.
+ Ibis final board state whiett canted =0.
4 © If bis not final state in game, then V(b) = VOb"
(eecuat
—_——
Scanned with CamScannerMACHINE LEARNING [JNTU-HYDERABAD)
h useful for checkers
gpecifies the value of V(b) for every hoard state b, but it is not much ;
n my Dotning he ‘value of V(b) for specific board state needs searching, shed for op sia lin ot
pay upto end of game except for trivial eases where games already ended. It is non snernsld oe sner ite nto nr ale
Wy vheckers playing program. The purpose behind the learning is to discover an operation: scriptior m
minimized to problem wr discovering ‘operntional description of ideal target 'V. It would be difficult to learn it perfectly,
Selecting Target Function Representation a
‘The representation must in sucha way thatthe eng program we ito dese the fain, we aan Tee
‘Are many options, for example’ 1m can sto represent V by using large table in
re Many options, for example the program can be allowed to represent vy using large | eae vaich mstch agaioat Retires
‘value for each distinet board state or it can be allowed to represent V by using a collection o m 4
‘of board state or quadratic polynomial function of predefined board features or artificial neural network{ This representation often
i es ing the ation close to ideal target
involves crucial trade off. On one hand select an expressive representation for allowing the represent
funetion, On the other hand the traning data required by program for seletingallerative hypothesis depends upon the expres.
siveness of representation. Now select a simple representation for given board state, for which the function V is calculated as
16
‘The recursive defini
since itis not computable
{(lincar combination of below features,
+ x: Number of black picces on board
x4: Number of re kistgs on board
umber of black pieces threatened by red
umber of red picces threatened by black
The learning program will therefore represent V (b) as linear function of the below given form,
VO) = wy, + EY HWA,
Partial Design of Checkers Learning Program
‘Task T: Playing checkers
Performance Measure P) Percent of games which are won in world tournament
‘raining Experience E: Games playel against itself
Target Function: V: Board —+ R
‘Target function representation
Vb) = w, +, x, + yx, + WAX + WX, ER EW,
G16. Discuss how approximation algorithm Is solectod.
Answer
‘Asset of training examples are needed to learn the target function V_ where each training example determines particulat
board state b and training value V,,,4(b) for b. Every training example can be defined as an ordered pair of form . For
instance, the below example defines a board state b where black won the game and has the target function V,.,(b)as¥ 100.
Estimating Training Values
The learner is limited to only training information whether the game won was eventually happened of lost. The training
‘examples that assign scores to specific board state is also required, Since values can be easily assigned to board states which cor
respond to end of the game, then itis less obvious about the process of assigning training values to more intermediate board states
which oceur before the end of the game (Phe success or failure of any game will not represent that every board state was good
or bad along the game path. Since there is ambiguity in estimation of taining values for intermediate board states, one simple
‘method is found to be successful.
‘This method assigns the training value of V,,4(b) for intermediate board state b to be V (Successor (b)), where V is
present approximation of learner to V and successor(b) is next board state that follows b, The rule of estimating training values
is given as follows, ‘
Rule for estimating training values,
Vyalb) + 7 (Successor(b))
WARNING: xeroxPhotocon/ngo his books 8 CRUMNAL act, Anyone und ily la LABLE fac LEGAL pococings.
Scanned with CamScannerUNIT betetucsien, Conch
IT 1 tetraduction, Conchgt Leatting an The Generatte- Specific Ordering. Decision Tree Learning 7
it might ee stezcige ts se prevent version of V
Seam version of V fom evimnating, the training, values which are used to refine the same flne-
tom Hoa on estimates of value of Succes (hy ate wed for estimating the benard state valve b This wil definitely
me wera if V tends to he accurate for boned states close to garnes er
tien It moet he nested 1
Adjasting the Weights
The te over inp
hc task thea i ef ver epecify the lamina algemithen Sor ceecaing to bet ff he vet of taining examples {"b, Vath
Fhe fist hep to determine what actual
alt stertnine what actually best fit means tn taining, data. A method which is followed is to define best
toedbess oy et of weighs that eden he ‘squated error Fin between training, values and the values which are predicted by
Ee mk (nth Worf
sli ar hig tt fi ei nse ton sthich reduces that is defined. In present scenario,
se alpetin hich ecfis te cre mcemenly eof ening angles tvailable and it is robust to errors in
Cate ining aes, Cresent in arin AS ig ae al weights for every observed
snsn in a wy tht rinimaes the eror in taining exarape, The LAS algorithm is defined as follows,
Deere
LAS weight update rule one er
105 eit pie ae: erioer o> noneedte chara,
For each teining example fo. V.,.(0)) A selarh $6 nO
Mate use of prevent weights for calculating V(b - a “
= 3 fy
For every weight w, update it ax sherwm belorm,
00,2 lA) WAbyp,
See weighty will 8 change when eror (Vb) V(b) 8 2er0. They get inreased in proportion fe value of the respec-
tive feaure oben (V(b) ~ V(b) is positive. This leads to increase im — V(b) and decrease in nek If value of any feature
oe pan then weighs ia ncn changed that then the weights updated ae those the features of which occur on training example
cond. This method in sonne vetings proves to converge to least squared error approximation YO V,. values.
017. filustrate the final design.
Answer t
“The final design of checkers Searing system i described naturally by four diene Program modules which indicate the
ental omnponents in mont of the earing systems. The four modules ae depicted as follows,
“Training Examples
‘Trace,
: £2, Vga)? . Water
Forecast
Hypothesis H: Each hypothesis described by conjunction of constraints on
Forecast. The constraints may be "2", “6 or any value.
ibutes Sky, Airtemip, Humidity, Wind, Water,
Farget Concept C: EnjoySpont :X + (0, 1)
‘Training Examples D: Positive and Negative examples of target fun
Determine: A hypothesis h in H such that h(x) » e(x) for all x in X >
WARNING: Xerox Picloconyng 0108 book 8 CRIMINAL ack. Anyone found gity& ABLE tatace LEGAL proceeings.
Scanned with CamScannerUNIT-1 Introduction, Concopt Laaming and The Genoral
Notation
Decision Troo Learning walt
The set of items over which the co
0 oneept is defined is
represented by atibutes Sky AieTem, Humility, Wind, W
ccd hy Ths tae ees to vai of ais Spo, Whi
a set of traning examples containing an instance x fon X with target con
cof available training examples, ‘ _—
ed as set oF instances denoted by X. The X is set of possible days
1st. The concept that is to be learnt is called target com>
arning target concept the learner is provided with
value C(x) The symbol D is used to represent set
Inductive Learning Hypothesis
get concept C over complete set of instances X. The data avail
target
data.
‘The ‘earning task determines hypothesis h similar to ta
be a S ioniee a eae re beers Hence inductive learning algorithms can assure that hypothesis can best
‘concept ove . The best hypothesis related to ut es is hypothe lh best fits a
This is assumption of inductive learning, tee nines pppoe nies eee
122, Concept Learning as Search, FIND-S: Finding a Maximally Specific Hypothesis
ain in brief about concept learning as search.
Model Paper-1, Cate)
| Concept leaming is a task that ean be considered as searching a large space of hypothesis which ave defined implicitly by
| ypothesis representation, The major aim of search iso locate de best iting hypothesis or waning ‘examples, The designer of
earning algorithm wil define thatthe space of hypothesis run by program cannot represent or even learn. This is done by it by
selecting hypothesis representation. For example, consider the instances X and hypothesis Hin EnjoySport learning task-
that,
Sky + three possible values
‘Air Temp, Humidity, Wind, water and fore east -— Two possible values
Instance space X 3.2.2.2.2.2 distinct instances
‘Another similar computation depicts that there are 5.4.4.4.4.4 syntactically distinct hypothesis in H. The nuniber of se-
‘mantically distinet hypothesis is 1 + (43.3.3.3.3) = 973.
General-to-Specific Ordering of Hypothesis
Most of the algorithms for concept learning,
for any concept learning problem i., general-to-speci
bbe designed which search infinite hypothesis spaces wi
ordering can be illustrated by considering two hypothesis,
fic 0
ithout the need of enumerating enc!
= (Sunny, ?, 2, Strong, ?, 7)
h, = Gunny, 2, 2.272).
Consider sets of instances which are classified as positive by fh, and hy. The h can classify more number of instances as
positive since it imposes only les numberof constants on stan, An instance which s classified by h, has chance to get las-
sre by he Ths ssid to be more gsneral tanh, For any instance xin ‘and hypothesis h in H, iis said that x can satisfy
piihen h(2) = 1, Consider two hypothesis and h, his said to be more general than_or_equal_to, when any instance which
satisfies h also satisfies I, Arbon and h, are boolean valved Functions then hs sid to be more_general_than_orequal_to
only when,
(vxeX) [th (a) =» 00= D)
se significant for eonsidering the cases in which one ofthe hypothesis fg more general than he Thus, it ean be said
thath, Spel than. only when (2) (hy 2, h.Adast an inverse can be eared to be useful and in such case
rans bexmore, specific than h, when, is more gener. than hy These ‘definitions can be beter explained by considering
2h example of three hypothesis hy» h, and hy, They are related by 2, relation,
‘ “SPECTRUM ALLIEOHHE JOURNAL FOR ENGINEERING STUDENTS OGRE
Scanned with CamScanneraoe MACHINE LEARNING (JNTU-HYDERABAD]
General
Instances X Hypothesis Hf
than Relation
Hypothesis and More_gon
Figure: Instanes
In the above figure, the box at left side depicts set X of gil the instances such that
= and ~, relations are defined as
illustrates a partial order over H, It even offers a significant structure over hypothesis,
way h, is more general than h,, But b, oh,
unrelated to target concept. The relation >
space Hi for any concept learning problem.
Q21. Discuss about finding a maximally specific hypothesis.
Answer :
{The more. general_than partial ordering can be sed to arrange the search fora hypothesis which is compatible with taining
examples by initiating with a possible hypothesis in H. Generalize it for every failure in order to enclose the observed positive
training example. Consider the below algorithm.
J. Initialize h with a hypothesis in H,
2. For each positive training instance x
2.1 For each attribute constraint a, in h.
2.1.1 If x satisfies constraint a, then do nothing,
2.1.2 Else replace a, in h with more general constraint that x can satisfy,
3. Output hypothesis h,
‘The above given is FIND-S algorithm. It ean be beter explained by considering a sequence of training examples f
i c of training les from
above algorithm for the task EnjoySport. The frst step of it would be assign a hypothesis Hl toh —
b= Gd 9)
“the "" constraints in ha
Values be,
¢ replaced with m
© general constraint because they are not salistied, Let the attribute
he (Sunny, Warm, Normal, Strong, Warm, Same) s
WARNING: Xerox/Photbcopying of tit bdok la CRIMINAL act Anyeria ound ult) LIAL ig Hae LEGAL proceedings,
Scanned with CamScannerUNIT-1 Introduction, C
|. Concept Learning and The Genoralto-Spoeitic Ordoring, Decision Troe Learning 1.13
Tiere h seems to he still more
Ail mo4e specific, states vive other than fo single positive waning
example. 1 of the instances are ne}
The second training
tribute value‘in h which
js not satisfied with new Igorithm to generalize h by submitting 2" in place o
fhe (Sunny, Warm, 2, Strong, Warm, Same)
In third taining exampk
and such examples are
ignored by FINDS algorithm
the algorithm will not
ue Jot make any change to h. This isa negative example
The fourth example guides to further generalization of h.
»
‘The FIND-S algorithm determ
ites am approach that allows to use more_generalthan partial ordering for organizing he
scarch for acceptable hypothesis. The se
plable hypothesis. The search is conducted on every hypothesis along the chain of partial ordering,
he (Sunny, Warm, 2, Strong,
Instances X.
Hypothesis H
Specific
e
e
e
General
Figure: Hypothesis Space Search Performed by FIND-S
“The above diagram depicts the search in terms of instance and hypothesis spaces. The Pox on left represents the set X of
all instances such that,
‘Sunny Warm High Strong Warm Same>, +
‘Rainy Cold High Strong Warm Change>, —
‘Sunny Warm High Strong Cool Change, *
“The box at right represents set H ofall hypothesis such that
b= <4, 0.4.9
bh, =
‘sunny Warm ? Strong Warm Same>
‘sunny Warm 7 Strong Warm Same>
,= i
Consistent{h, D)= (VX. ofx)>ED) HAD = (8 ;
‘The CANDIDATE-ELIMINATION algorithm represents set of all hypothesis: compatible noe
‘Such type subset of hypothesis is known as version space in terms af hypothesis space Hand training
‘all the possible versions of target concept. .
_ VSqo % theH| Compatible (h, D)}
‘Version space can be represented by listing all fis members. This gives rise toa simple algorithm called LISTEN
ELIMINATE algorithm. Itbegins the version space tohold the hypothesis in Hand then remove the hypothesis found ineornmaitbhe
‘with training example. It may decrease with an increase in examples until one hypothesis compatible with all examples is found,
‘Version space + List of hypothesis in H.
For every training example, eliminate any hypothesis h for which Ix) # c(x) from version space,
with observed training examples,
snples D since it holds
3. Generate list of hypothesis in version space.
‘The CANDIDATE-ELIMINATION algorithm evaluates version space holding hypothesis from H compatible with ob-
served sequence of training examples. It starts by initializing version space to set of hypothesis in H. This means by initializing
G bounding set to hold most general hypothesis in H.
GH {222220}
and initializing S boundary set to hold most specific hypothesis
S14. 9,4 OO PP
_ The above boundary sets determine the complete hypothesis space since every other hypothesis in H is general than S, as
well as specific than G,. The S and G boundary sets can be generalized and specialized respectively for removing from version
space, the incompatible hypothesis.
“The computed version space only holds the hypothesis which are compatible.
‘Assign set of maximally general hypothesis in H to G
Assign set of maximally specific hypothesis in H ®
For each training example x do,
— __ Ifxis positive example
> Eliminate incompatible hypothesis with x from G
+ Foreach hypothesis s in $ which is incompatible with x
Eliminate s from $
+ Add minimal generalizations h ofs to S such that
hh is compatible with x and some member of G is more general than ft
Eliminate hypothesis that is more general than other hypothesi
1
ins from S
+ Ifxis ancgative example
+ Eliminate hypothesis incompatible with x from $
WARNING: xerox/Photocopying of this book fs @ CRIMINAL act, Anyone found gulty Is LIABLE to tace LEGAL proceedings.
- Scanned with CamScannerUNIT-1_ Introduction, Concept Learning and The General-to-Specific Ordering, Decision Tree Learning 1.15
“+ For each hypothesis gin G which is incompattle with x
— — Eliminate g from G “ a
+ Add minimal spegiairations fot 10 6 gach tha
‘his compatible with x and some other member of $ more specific
+ Eliminate hypothesis that is less ger
eral than other hypothesis in G
‘The CANDIDATE-ELIMINATION algorithm is spe
‘and specializations of hypothesis and there by identilyi
ied
ominial and ne
Jearning task and hypothesis space with well defined operations.
ers of operations like evaluation of minimal generalizations
waximal hypothesis, The algorithm is applicable to
Derive an example to explain the working of candidate eliminate algorithm.
Answer = Dec-20, G1
‘Comsider the below example,
Example | __Sky | AlvTemp | Tumiaiy [Wind Water] Forecast | FajoySport
7 ‘Sunny ‘Warm Normal ‘Strong, Warm ‘Same
‘Sunny Warm High Strong ‘Warm Same
Rainy Cold * High Strong, ‘Warm Change
Sunny Cold High Change
‘The below figure traces the candidate elimination algorithm.
{$6.45 6.66 07}
|
: 8: [ (Sunny, Warm, Normal, Song, Warm, Same>}
I
§ q[/f)
Gy G6: [2.222% FI
‘The boundary sets are initialized to G, and S,. For the training example ,
EnjoySport = Yes, the candidate elimination algorithm checks $ boundary and then finds that itis specific and cannot cover the
positive example. The boundary is updated by moving it to least general hypothesis which covers new example. The second
training example , EnjoySport = Yes is similar to effect of generalizing S followed
by S, without updating G. Consider the third training example,
f S,, 5] (}
G, :{ <2, Warm, ?, Strong, ?, 2>}
G,:[{)
‘The learned version space in above figure is not dependent on sequence based on which training examples are represented.
and G boundaries are moved closer by delimiting a smaller version space of candidate hypothesis,
‘Define the boundary sets general boundary G, specific boundary S.
Answer + t is : ‘80p-20, a1(0)
General Boundary G
The general boundary G, with respect to hypothesis space H and training data D is defined as set of maximally general
members of H consistent with D. : R
G= {gel | Consistent (g, D) a (~3g'elt) [eg A Consistent (@, D)})
Specific Boundary S
___ The specific boundary § with respect to hypothesis space H and training data D is set of minimally general members of H
consistent with D.
$= {scH | Consistent (s,D) a (—3s'cH) [(s > 81) A Consistent (s', D)}}
Until G and $ are well defined they can saisty the version space. Its set of hypothesis contained in G and § along with
those lying in between G and S in partially ordered hypothesis space, This is illustrated in below theorem.
WARNING: xerox/Photocopying of ti bok I's CRIMINAL act. Anyone found Gully i LIABLE to fac LEGAL proceedings,
Scanned with CamScannerUNIT-1_ Introduction, Concept Learning and The General-to-Specific Ordering, Decision Tree Learning
1.17
Theorem
Consider X as arbitrary set of instances and Has set
of boolean valued hypothesis defined over X, Consider € : X
=+ (0, 1} as arbitrary target concept defined over X and Di
arbitrary set of training examples (). For all X, H, C
and D such that S and G are well defined,
VS," hell |3se8) GgeG) (g 2, he,
Proof .
The theorem is proofed by sufficing it to represent that,
1, Every that satisfies the right side of expression is avail-
able in VS,, 5.
Every member of VS,
of expression,
Will satisfy the right hand side
To illustrate (1) assume g as arbitrary member of G, Sas
arbitrary member of Sand has arbitrary member of H such that
82, h>, S. According to the definition of § the s must be satis-
fied by all positive examples in D. The h must be satisfied by all
positive examples of D since h>,s. According to the definition
ofG, the g cannot be satisfied by negative example in D. The
his consistent with D and h is member of VS,, , because h is
satisfied by al positive examples in D and no negative examples
in D. The argument for (2) is complex. It can be proved by as-
suming some h in VS,, , Which cannot satisfy right-hand side
‘of expression, Representing this would lead to inconsistency.
Will the candidate elimination algorithm con-
jerge to the correct hypothesis? Justify your
answer.
Answer + ‘Sop-20, ara)
‘The version space which the CANDIDATE-ELIMINA-
TION algorithm learns will combine with hypothesis which can
describe the target concept based on below conditions,
‘Training examples does not contain any errors.
“% The H contains some hypothesis that can define target
‘concept.
Inthe process of observing training examples, the version
space can be tracked to know the ambiguity related to target
concept and to know the observation of training examples for
identifying target concept. The target concept is learned when
8 and G boundary sets combine into single hypothesis.
For example if this training example is not in proper format
and depicted as negative example the algorithm definitely
climinates target concept from version space. This is done be-
cause it elitninates all the hypothesis incompatible with taining
‘examples, The leamer can easily identify the incompatibility by
‘observing that S and G boundary sets that combine with empty
version space. This represents that the H does not contain any’
Compatible hy jmilar symptom is seen when target
concept cannot be determined in hypothesis representation when
taining examples are correct. i
SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS
Q26, Write In brief about the following,
(i) What training example should the learner
request next?
How can partially learned concepts be
used?
a
Answer
() What Training Example Should Learner Request
Next?
‘An extemal teacher provides training examples to the
Jeamer. If Ieamer can conduct the experiments in which the
next instance is selected then a correct classification for it can
be obtained from external oracle. Such type of situations bail
by leamer and classified by external oracle are referred to as
query. Consider the version space that is learned from four
training examples of play cricket concept, the good query that
the leamer to present and good query strategy would be thatthe
Teamer must dominate the alterative compeiing hypothesis in
present version space. So an instance need to be selected that
can be positively classified by these hypothesis but negatively
dy others.One example is here,
If this instance is classified as positive example by
trainer, then S boundary of version space can be ignored. If
trainer represents it as negative example, then G boundary can
be specialized, The learner can thus be successful in learning
about tree identity of target concept. The best query strategy
for learner would be to produce instances which satisfy half
of hypothesis in present version space. This is possible when
vers
and correct target concept identified with experiments. But it
might not be feasible to develop the instance that matches half
of hypothesis. In such type of cases large number of queries
than [log, VSI] might be needed.
space is minimized to half along with new example
(i!) How Can Partially Learned Concepts be Used?
If additional training. examples are not available, but
learner need to Classify the new instances which are unidenti~
fied, then version space hold number of hypothesis representing
that target concept is not leamed completely, the examples can
still be classified with same degree of confidence as ifthe target
concept is identified uniquely:
Scanned with CamScanneroY
a)
ss MACHINE LEARNING [JNTU-HYDERABAD)
‘Outlook, Whether
Day Raining
Pay Norm
Nie Raining,
Niebt Normal
Torm of postive instance by present version space hypothesis. The
Ae caer Fach hypothesis is not required
ositivé since the hypothesis in version space also agree 10 it.
In the albove figure, the inst
learner is allowed to el
lassi Has
: Je met when instance can satisfy
to get enumerated in version space to test the positivity of instance. Such type of condition can be met ance ean
allthe mmibers of S. The hy phest in version space classify instance Has negative provided a partially leamed concept One
efficient test for it would be that the mstance eamnot s members of G. Instance C poses a different situation. 1is classi.
pothesis and negative by other half. The learner can therefore is unable to classify
lable,
fied as positive by halfof the version spa
it with confidence that the remaining training examples are
‘The instance D is classified as positive by two hypothesis in version space and as negative by four hypothesis. In such case
the confidence is less in classification than in unambiguous cases of instances A and B.
27. Write short notes on the following,
(i) Abiased hypothesis space
(i) An unbiased learner.
Answer:
© ABiased Hypothesis Space
“To assure that hypothesis space holds unknown target concept, the hypothesis space must be upgraded to add the possible
hypothesis. For example, consider.
Example [Outlook [Whether [Play Cricket _]
Lo Day Raining No
2 Day Normal Yes
3 Night Raining No 9
4 Night Normal Yes
In the above example, the hypothesis space is restricted to add conjunctions of attribute values. Duc to this the hypothesis
cannot represent simple disjunctive target concepts like "Sky = Sunny or Sky = Cloudy”. In the above table the training examples
‘of disjunctive hypothesis discovers that there'are no hypothesis in version space. The most specific hypothesis that is consistent
‘with training examples initially in hypothesis space H is,
© S,1<%Normal>
‘This hypothesis is overly general even though itis specific hypothesis from H consistent with examples. The problem arises,
from learner becoming biased to consider conjunctive hypothesis. In such case more expressive hypothesis space is required,
(ii) An Unbiased Learner :
_A common sohition to problem of assuring target coricept is available in hypothesis space H is to offer a hypothesis space
that is capable of depicting teachable concept, Generally, the set of all subsets of a set X is ealled power set of X. Consider re-
formulation of play cricket learning task in unbiased way by defining new hypothesis space H*which can depict all the subset
of instances, that js let I relate 6 power set of X. One way for defining H' isto enable arbitrary disjunctions, conjunctions and
negotiations of previous hypothesis, The target concept "Day = Normal or Day = Raining" ean be
} be
F training examples of C. Let L(x, D,) represents classification assigned to instance x, by L after training data D,
ive bins of L can be decreased set of assertions of B such that for target concept C and related training examples D,.
(¥xeX) (BAD. Ax) F L(x, Do]
‘To know the inductive bias of CANDIDATE-ELIMINATION algorithm specify L(x, D,) for the algorithm. For a set of
data D, the CANDIDATE-ELIMINATION algorithm will calculate the version space VS,» ,,,and then classify the new instance
from hypothesis in version space. Assume that it will generate a classification for x, when vote among version space
is positive or negative and will not generate classification otherwise. According to this definition the inductive bias
is the assumption CeH. The inductive bias of CANDIDATE-ELIMINATION algorithm can be depicted as the target concept
contained in given hypothesis space H. The below figure depicts this situation.
Inductive System
Training Examples C
Elimination ee
Algorithm by [Classification of New
Using Hypothesis ‘Instance or "Dont Know’
New Instance==—= Space H
Equivalent Deductive
System.
Lr
‘Thermal Power = Classification of New
ae | Instance or "Dont Know"
Assertion H Holds
Target Concept
Figare: Modeling inductive Systems with Equivalent Dedwctive Systems
‘The inductive CANDIDATE-ELIMINATION algorithm at top of igure accepts two inputs at bottom of figure an additional
theorem power is provided along with two inputs. The two systems will be generating similar output for possible input set of
training examples and possible new instance of X. For example consider the below algorithms,
1. FIND-S
It finds the specific hypothesis that is compatible with training examples. It makes use of hypothesis for classifying the
succeeding instances. . .
2 ROTE LEANER
Learning here represents storage of each observed training example in memory. The succeeding instances are classified by
ig them in memory, IT instance is found in memory thea stored classification will be generated. Otherwise @ new instance
will tb ot
SPECTRUM ALLIN-ONE JOURTAL FOR ENGINEERING STUDENTS |
as
Scanned with CamScanner= MACHINE LEARNING [JNTU-HYDERABAD)
% CANDIDATE-ELIMINATION Algorithm
pall members of current version space Obey
to classification otherwise NeW instance
The new instances a
c classified wher
Will not be classified
cortions of unseen instances. Some of the
ike bias "the hypothests space Hf contain
ferences like "more specific hypothesis
le by learner like the considered ones,
live leaps by classifying large p*
The strongly biased methods lead to: more ind
inductive biases relate to categori | accumptions which dominate some of the concept Ii
ifying the pre
target concept”. Other inductive biases rank the ofder of hypothesis by specify ine f
are preferred en cr more general hypothesis” Some of them can he modified and wnehanee™
1.3 DECISION TREE LEARNING
1.3.1 Introduction, Decision-Tree Representation, Appropriate Problems for Decision Tree
Learning, The Basic Decision Tree Learning Algorithm “
ive a brief introduction about decision tree learning. How decision trees are represented.
Mode Papert, 0
wer
Decision Tree Learning :
he discrete valued target functions where the decision tree represents
Decision tree learning is a process of resembling t a ee ear aaa
the leamed function. It is a widely used and practical method for inductive inferenc
expressive hypothesis space and then removes.the difficulties of the restricted hypothesis spaces.
Decision Tree Representation
Decision trees categories the instances by organizing them down
tests of attributes of instances. The branch connecting to a node correspon
classified beginning at root node, the attribute is tested and then below branches are proces
subtree that is rooted at new node. Consider the below decision tree,
the tree from root to leaf node. All the nodes represent
ds to values for these attributes. Initially an instance is
‘sed. This process is repeated for the
j Outlook
Day Night
Raining Normal Raining "Normal
No Yes No Yes
igure: Decision Tree for Concept Play Cricket
“The above decision tree classifies a particular day whether it is suitable for playing cricket. For example, the instance.
Outlook = Day, Day = Raining, Night = Raining>
Will be sorted down towards Iéft branch and classified as negative instance. The decision trees generally depict both the
disjunction of conjunctions of constraints on attribute values of the instances. Every path in between root node and leaf node
indicates the conjunction of attribute tests and tree itself to disjunction of them. For instance, the decision tree leads to below
expression,
(Outlook = Day » Normal = Yes) v (Outlook = Night «Normal = Yes).
Q30. List out the appropriate problems for decision tree learning.
Answer +
“The decision tree learning is fairly suitable to the problems that have below mentioned characteristics,
() Training Data Might Holdin Errors” ;
“The decision tree learning methods are vulnerable to errors, Such examples are clearly explained by errors in classifications
related to training examples and errors in attribute values
WARNING: xoroxPnotacopying ofthis book is 6 CRIMINAL set Anyone ound uly I LIABLE to face LEGAL proceedings.
s a
Scanned with CamScannerUNIT-1_Introduetion, Concept Lentning and The Ganernl.to-Spacitic Ordering, Decision Tree Learning 1.21
aw Conalste of Di
cvision tree meth
tens
fe Output Val
nctions holding two ar mote possible outcomes. Furthermore realistic
igh decision tree application is not more
. to learning
les Fentning target funetion Nokding real-valued outputs even the
(i) Training Data Might Hold Attribute Vatues
Decision tree
thoxls can also be used in the cases in which the tr iples hold unknown values.
(iy) Instances are Represented hy Attribute-value Patra
Asset of attributes and their values represent the instances. The possible and simple situation for decision tree learning is
while the attributes hold few disjoint possible values
(v) Disjunctive Descriptions Might be Needed
The det
on trees tend to represent the disjunctive expressions naturally.
Most of the problems fit possess the above characteristics. The decision trees are applicable to problems like learning to
categorize the patients as per their disease, malfunctions in equipments and loan applicants. Such type of problems are known as
lassification problems. They allow the task to classify the examples into discrete set of categories.
1,/WWhite the basic decision tree learning algorithm.
a . Medel Paper-2, (a)
Decision Tree Learning.
Decision tree learning is a type of supervised machine learning wherein the data is segregated according to a specific
parameter.
Algorithm
‘The algorithm for learning the decision tree is as follows,
Function DTREE_LEARNING(examples, attributes, default) returns a DTree
Inputs: examples, collection of examples.
attributes, collection of attributes
default, default value is taken for the goal predicate
If “there is no example’ then
return ‘default’ value
elseif ‘the categorization is same for all the example’
then return ‘the categorization’
else if “there is no att
return MAX. VAL (examples)
else : 4
beste-SELECT_ATTRIBUe(atibutes, examples)
treee-a new Tree with root test best ' see
me-MAX.YAL oxamplesn’
for each valye'vi of bet! do t
examplesn «(elements of examples.with bestevn)
subtreee= DTREE._LEARNING(examplesn, altibutes_ best,
(jp: DUNT SPECTRUM ALLAN-ONE JOURNAL FOR ENGINEERING STUD!
Scanned with CamSeannerIf the tree is induced with mor
‘Types of Decision Trees
al tree.
The algorithm which produces the final teee ts induced with 12
Parone
None,
Sonn
No] [Rex
MACHINE LEARNING (JNTU-HYDERABAD)
netude a tree with a branch of label vi andl subtree “subtree”
Fri/Sat?
Figure: Decision Troe Induced with 12-example Data Set
The above tree is generalized from the original tree. The learning algorithm observes only the examples but not the exact
function. However the hypothesis i.e., above tree satisfies all the examples. In addition to this, it is considered as very simple
compared to that of orig
In the above tree, the algorithm does not use the tests for ‘Raining’ and ‘Reservation’. This is because the algorithm can
generalize all the examples without using both of them. Also, it is observed from the tree that during the Fridays and Saturdays,
the first author is waiting for the “Indian food’.
amples then it will be similar to the original tree. In the above tree, there appears a
mistake ic., even though there is a waiting of 0 — 10 minutes, the restaurant seems to be full.
Decision trees are mainly classified into two types,
Classification Trees
It refers to a type of tree where the decision or outcome variable is categorical,
Regression Trees
It refers to a type of tree wherein the decision or outcome variable is continuous.
‘Construct a decision tree for the following training data and use entropy as a measure of impurity.
No Yes
fe
Income Student Credit_rating Buys_computer
high No Fair
[high No excellent
high No Fair
[medium No. Fair
tow Yes Fair
low Yes | excellent
low Yes excellent
medium No Fair
iow Yes Fair
medium Yes, Falr
medium Yes | excellent
medium No, ‘excellent
high Yes Fair
WARNING: XefoxPholocopying of is book 'a
‘CRIMINAL act, Anyone found guilty Is LIABLE to face LEGAL proceedings. © =
examples data set which is shown as follows,
Thailand
Scanned with CamScannerUNIT-1. Introduction, Concey
eaming and The General-to-Specific Ordering. Decision Tree Learning 1.23
ring, Decision Tree Learni 2:
Answer =
Sep.20, 06
For the piven data, the decision tree is as full
30] [rtoao] [240
Student? Yes [credit Rating?
fexceltent] [Fair
No ‘Yes ‘No ‘Yes
Let class P: Buys_computer = "Yes'
Class N: Buys_computer = "no"
(9, 5) = 0.940
Po)
Now compute entropy for age.
‘Age BR | 4 | ten)
Fp 102,3) +f 14.0) + Fy 18.2)
E(age!
= 0.694
You are stranded on a deserted island. Mus!
but no other food is anywhere to be found.
,onous and other as not (determined by your former companion:
‘one remaining on the island. You have the following data to consi
hrooms of various type!
‘Some of the mushroom
‘shave been determined as poi
and error). You are the only
[Example [ Not Heavy Smelly [Spotted | Smooth | Edible
As 4 0 1
o | 1
cet 4 i
1 o
o oO
fj 4 | oe} ;
1 0 b> Yes
4 2 5 >No
0 0 6
> ¢ lo:
1 > 8 105, (SAS Ge)
| 4 oj 7
0 oe >
s A through Hare polsonous, but you do not know about U
You sow whet ernaaome Uv and ug te sen yer enous or nt pOmPnON
SPECTRUM ALLIN-ONE JOURNAL FOR ENGINEERING stupents |
eR
Scanned with CamScannerMACHINE LEARNING [JNTU-HYOERABAQ)
(000.20, 03
1.24
Answer #
as follows,
1s poisonous oF not poisonous is
24,0) 4 2-] (043-1 WOT
A A A A
1 tree as poisonous oF not poisonous i as Fallows,
Classification for U, V and W mushrooms by using the dec
U_[Smooth = 1, Smelly = 1 = Es
V_[Smooth = 1, Smelly = 1 => Edibl
W_| Smooth = 0, Smelly = | = Edibl
1.3.2 Hypothesis Space Search in Decision Tree Learning, Inductive Bias in Decision Tree
\ Learning, Issues in Decision Tree Learning
xplain about hypothesis space search in decision tree learning.
,
‘Aniower : Model Paper-t,3)
‘An inductive learning method ID3 can be denoted as searching a space of hypothesis for the best fit of training examples.
‘Asset of possible trees denote the hypothesis space searched by ID3. Itis used for conducting search like simple-to-complex,
bill climbing ete starting with an empty tree and thereby considering the enhanced hypothesis by searching the decision tree which
ccan classify the training data
(D, Dy Dy
(6+,3-
Outlook
Night
lease tec
{D1, D2, D8, D9, Dio} {D3, D7, D4, D5, D6)
24341 (4+,0-]
Figure: Partially Learned Decision Tree
‘The above figure depicts the fll-climbing search. Its a partially learned decision tree generated by the first step of ID3.
‘The training examples afe organized towards the respective descendent nodes. The left and right nodes get expanded with the
selection of attribute that has the maximum information gain related to new subsets. Some of the capabilities and limitations of
1133 are as follows,
@ — Inholds only one hypothesis white searching decision tree spaces. With ths it might have loss in its capabilities following
{from presenting the consistent hypothesis.
4 Itmakes use of all the training examples in all the steps while searching to build the decisions related to current hypothesis
refinement one benefit of this sthe search wll be very less sensitive towards errors in separate taining examples, ID3 ea?
be further enhanced for handiing the noisy training data by altering the finishing condition for receiving the hypothesis
that cannot fit raining. data perfectly,
WARNING: Xerox Photocopying otha book sa CRIMINAL oct Anyone found gully Is LABLE to face LEGAL procoedings.
Scanned with CamScannerUNIT-1_ Introduction, Concept Learning and The General-to:Spoeific Ordering, Decision Tree Learning 1.25
3 _ The ID, hypothesis space of decision tees tends to he space of Tted discrete valved fetwons related to the existing
attributes, This is ecause ane ofthe decision tee epics the discrete valued functions. IDS climates the possibility oF
risks imposed by methoas which search partial hy pothesis spaces
attributed for testing at a Tevel in tree, then there:
associated with hill-climbing search without
| Idoes not perform any backtracking in the search. When it opts for
is no backtracking fir reconsidering it.I is therefore vulnerable for the
any backtracking. ulnerable fi
6) )Biscuss about inductive bias in decision tree learning.
wert
Inductive bias can be defined as. set of assumptions which can justify the classifications that are assigned to future instances,
This is done along with training data. For a set of training examples there might be a number of trees which are compatible with
these examples, Inductive bias ean be explained based on its selection among the compatible hypothesis. The first tree is selected
tis found while hil-climbing and simple-to-complex search on possible tees. Then the ID3 search process will select for the
penefit of shorter trees that can accommodate attributes which have highest information gain adjacent to root. The inductive bias
lof D3 is complex to expose. For example, consider an algorithm which initiates at empty tree and then searches breadth first
‘over complex trees sequentially through depths. A smallest consistent tree returned as a result. Such algorithm can be called as
BFS-ID3. It determines shortest decision tree and represents bias "shorter trees are mostly preferred”. Compared to this approach
1D3 makes use of gain heuristic and hill climbing strategy to represent complex bias than BFS-ID3. There are two types of biases
namely restriction bias and preference bias.
For answer refer Unit-1, Q36.
sme hypothesis among others with no complex restrictions on
hypotheses. Such type of bias is known as preference bias. And bias of candidate elimination algorithm will be in the form of
categorical restriction on set of hypothesis. Such type of hypothesis is known as restriction bias. The preference bias is considered
for generalizing beyond training data since it enables the leamer to work in complete hypothesis which holds unknown target
function. In such case restriction bias is not considered because the unknown target function is excluded.
‘The inductive bias of IDS is considered as preference for s
+The reason for inductive bias going in favor of shorter decision trees or short hypothesis is not clear.
“The scientists are found to follow the inductive bias, Even the physicists opt for simpler briefings for planet motions. Since,
there are less hypothesis so there is less chance to find hypothesis which can fit present training data. In ease of decision tree
hypothesis. The $00-node decision trees are found in Jarge number compared to S-node decision trees. It is expected in case of
20 training examples to find many 500-node decision trees that are compatible with them. But in such situation there are certain
problems noticeably found..Instead of above mentioned nodes consider the decision trees with 17 leaf nodes and 11 non leaf
nodes where A, is root and A, to A,, are test attributes. So there will be ess number of trees with such feature and protubility of
finding a compatible tree with arbitrary set of data is loss. The problem here is that there are alot of small sets of hypothesis to be
defined and most of them are hidden, Another problem is the hypothesis size is depicted by some representation of learner. Any
wo learners who use different internal representations will reach to different hypothesis.
Itis also argued that the above statement generates two different hypothesis from same training examples when learners
apply it and the examples have different representations. Due to this there might be chances to reject the above statement, The
emergence might build internal representations which can make the learning algorithms inductive bias a self-fulfilling prophecy
ineeit has the ability to modify the representation simple than that of learning algorithm.
contrast the hypothesis space search in ID3 and'candidate elimination algorithm.
[Answer + 2 . Dec-19, a),
“The difference is depicted by 1D3 and candidate elimination algorithms. Consider the key element as hypothesis search,
—ID3conduets search for complete hypothesis space. But an incomplete search is done through this space tll the termination.
condition. Such search strategy generates the inductive bias without any additional bias,
4 Candidate elimination algorithm conducts search for an incomplete hypothesis space. This is performed by identifying the
consistent hypothesis for eaining dal. Such typeof representation of hypothesis expressive power generates he sting
| bias without any additional bias.
20S. SPECTRUM ALLIN-ONE JOURNAL FOR ENGINEERING STUDENTS -
Scanned with CamScannerMACHINE LEARNING (JNTU-HYDERABAD}
2
ee
Mustrate in lecision tree learning.
i the impact of overfitting In a typical application of decis
(ntedet Papae 2. 233) (Dec-19, G35
ta in case of any alternative hypothesis
‘compared to h over instances,
Anawer :
sthesis he H said to overt raining dt
Tr ples. But h’ holds smaller erro
For a given hypothesis space H, the by
Bel such that h holds small error compared to 8 over training evar
in application of decision tree learning
The below figure depicts the effect of overftting
ao
oss b
les ee
oo 0 0 © 0 oO 9 1
+ Size of wee
re: Overfitting in Decision Tree
In the above case ID3 is applied to learning task. The horizontal axis represents number of nodes in decision tree and
vertical axis represents the accuracy of predictions. The solid line represents accuracy of decision tree overtraining examples and
dashed line represents accuracy measured over separate set of test examples.
‘Atree h can fit into training examples better than hi, but to perform poor over examples. This is possible if training examples
hold random errors or noise. For example, consider effect of including below positive training example labeled as negative,
No, Play Cricket = No>