lOMoARcPSD|52112372
ML UNIT-V Lecture Notes
Computer Science and Engineering (Shadan Women's College of Engineering and
Technology)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university
Downloaded by CSE DR. RAVI BHUKYA (
[email protected])
lOMoARcPSD|52112372
CS601PC: MACHINE LEARNING
III Year B.Tech. CSE II-Sem.
UNIT- V
Analytical Learning-1- Introduction, learning with perfect domain theories:
PROLOG-EBG, remarks on explanation-based learning, explanation-based
learning of search control knowledge.
Analytical Learning-2- Using prior knowledge to alter the search objective, using
prior knowledge to augment search operators.
Combining Inductive and Analytical Learning – Motivation, inductive-
analytical approaches to learning, using prior knowledge to initialize the
hypothesis.
TEXT BOOKS:
1. Machine Learning–Tom M.Mitchell-MGH (Page Nos. 307 to 364 )
ANALYTICAL LEARNING-1
Introduction:
An analytical learning method is also called explanation-based learning
(EBL). In explanation-based learning, prior knowledge is used to analyze, or
explain, how each observed training example satisfies the target concept. This
explanation is then used to distinguish the relevant features of the training
example from the irrelevant, so that examples can be generalized based on
logical rather than statistical reasoning.
Learning algorithms accept explicit prior knowledge as an input, in addition
to the input training data. Explanation-based learning is one such approach. It
uses prior knowledge to analyze, or explain, each training example in order to
infer which example features are relevant to the target function and which are
irrelevant.
Explanation based learning uses prior knowledge to reduce the complexity
of the hypothesis space to be searched, thereby reducing sample complexity and
improving generalization accuracy of the learner.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
Consider first example, Knowledge about the legal rules of chess:
knowledge of which moves are legal for the knight and other pieces, the
fact that players must alternate moves in the game, and the fact that to win the
game one player must capture his opponent's king. Note that given just this prior
knowledge it is possible in principle (domain theory) to calculate the optimal
chess move for any board position. That is, In analytical learning, the learner must
output a hypothesis that is consistent with both the training data and the domain
theory.
Difference between analytical and inductive learning methods:
Consider Second example,
An instance space X in which each instance is a pair of physical objects.
Each of the two physical objects in the instance is described by the predicates
Color, Volume, Owner, Material, Type, and Density, and the relationship between
the two objects is described by the predicate On.
Given this instance space, the task is to learn the target concept "pairs of
physical objects, such that one can be stacked safely on the other," denoted by
the predicate SafeToStack(x,y). Learning this target concept might be useful, for
example, to a robot system that has the task of storing various physical objects
within a limited workspace. The full definition of this analytical learning task is
given in Table 11.1.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
LEARNING WITH PERFECT DOMAIN THEORIES: PROLOG-EBG
We consider explanation-based learning from domain theories that are
perfect, that is, domain theories that are correct and complete. A domain theory
is said to be correct if each of its assertions is a truthful statement about the
world. A domain theory is said to be complete with respect to a given target
concept and instance space, if the domain theory covers every positive example
in the instance space. Put another way, it is complete if every instance that
satisfies the target concept can be proven by the domain theory to satisfy it.
Completeness does not require that the domain theory be able to prove that
negative examples do not satisfy the target concept. So, Completeness includes
full coverage of both positive and negative examples by the domain theory.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
An algorithm called PROLOG-EBG, is representative of explanation-based learning
algorithms and it is a sequential covering algorithm.
PROLOG-EBG operates by learning a single Horn clause rule, removing the
positive training examples covered by this rule, then iterating this process on the
remaining positive examples until no further positive examples remain
uncovered. When given a complete and correct domain theory, PROLOG-EBG is
guaranteed to output a hypothesis (set of rules) that is itself correct and that
covers the observed positive training examples. For any set of training examples,
the hypothesis output by PROLOG-EBG constitutes a set of logically sufficient
conditions for the target concept, according to the domain theory.
A clause with at most one positive (unnegated) literal is called a Horn Clause.
Deductive Database: A type of database which can make conclusions based on
sets of well-defined rules, stored in database.
For each new positive training example that is not yet covered by a learned Horn
clause, it forms a new Horn clause by:
(1) explaining the new positive training example,
(2) analyzing this explanation to determine an appropriate generalization,
and
(3) refining the current hypothesis by adding a new Horn clause rule to
cover this positive example, as well as other similar instances.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
(1) explaining the new positive training example:
(2) analyzing this explanation to determine an appropriate generalization
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
(3) Refining the current hypothesis by adding a new Horn clause rule to
cover this positive example, as well as other similar instances.
The current hypothesis at each stage consists of the set of Horn clauses. At
each stage, the sequential covering algorithm picks a new positive example that is
not yet covered by the current Horn clauses, explains this new example, and
formulates a new rule according to the procedure described above.
Only positive examples are covered in the algorithm as we have defined it,
and the learned set of Horn clause rules predicts only positive examples. A new
instance is classified as negative if the current rules fail to predict that it is
positive. This is in keeping with the standard negation-as-failure approach used in
Horn clause inference systems such as PROLOG.
REMARKS ON EXPLANATION-BASED LEARNING:
(Properties of Explanation Based Learning Algorithm)
1. Unlike inductive methods, PROLOG-EBG produces justified general
hypotheses by using prior knowledge to analyze individual examples.
2. The explanation of how the example satisfies the target concept determines
which example attributes are relevant: those mentioned by the explanation.
3. The further analysis of the explanation, regressing the target concept to
determine its weakest preimage with respect to the explanation, allows
deriving more general constraints on the values of the relevant features.
4. Each learned Horn clause corresponds to a sufficient condition for
satisfying the target concept. The set of learned Horn clauses covers the
positive training examples encountered by the learner, as well as other
instances that share the same explanations.
5. The generality of the learned Horn clauses will depend on the formulation of
the domain theory and on the sequence in which training examples are
considered.
6. PROLOG-EBG implicitly assumes that the domain theory is correct and
complete. If the domain theory is incorrect or incomplete, the resulting
learned concept may also be incorrect.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
EXPLANATION-BASED LEARNING OF SEARCH CONTROL
KNOWLEDGE:
The largest scale attempts to apply explanation-based learning have
addressed the problem of learning to control search, or what is sometimes called
"speedup" learning. For example, playing games such as chess involves searching
through a vast space of possible moves and board positions to find the best move.
Many practical scheduling and optimization problems are easily formulated as
large search problems, in which the task is to find some move toward the goal
state. In such problems the definitions of the legal search operators, together
with the definition of the search objective, provide a complete and correct
domain theory for learning search control knowledge.
First we have to formulate the problem of learning search control and then
we can apply explanation based learning.
Consider a general search problem where
S is the set of possible search states,
O is a set of legal search operators that transform one search state into another,
G is a predicate defined over S that indicates which states are goal states.
The problem in general is to find a sequence of operators that will
transform an arbitrary initial state si to some final state sf that satisfies the goal
predicate G.
PRODIGY is a domain-independent planning system that accepts the
definition of a problem domain in terms of the state space S and operators O. It
then solves problems of the form "find a sequence of operators that leads from
initial state si to a state that satisfies goal predicate G." PRODIGY use s a means-
ends planner that decomposes problems into subgoals, solves them, then
combines their solutions into a solution for the full problem.
The net effect is that PRODIGY uses domain-independent knowledge about
possible subgoal conflicts, together with domain-specific knowledge of specific
operators, to learn useful domain-specific planning rules.
SOAR supports a broad variety of problem-solving strategies that subsumes
PRODIGY'S means-ends planning strategy. SOAR uses a variant of explanation-
based learning called chunking to extract the general conditions under which the
same explanation applies. SOAR has been applied in a great number of problem
domains and has also been proposed as a psychologically plausible model of
human learning processes
PRODIGY and SOAR demonstrate that explanation-based learning methods
can be successfully applied to acquire search control knowledge in a variety of
problem domains.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
For example, in a series of robot block-stacking problems, PRODIGY encountered
328 opportunities for learning a new rule, but chose to exploit only 69 of these,
and eventually reduced the learned rules to a set of 19, once low-utility rules
were eliminated.
COMBINING INDUCTIVE AND ANALYTICAL LEARNING
MOTIVATION:
Purely inductive learning methods (decision tree, Back propagation)
formulate general hypotheses over the training examples. Purely analytical
methods (PROLOG-EBG) use prior knowledge to derive general hypotheses.
The method that combine inductive and analytical mechanisms to obtain
the benefits of both approaches: better generalization accuracy when prior
knowledge is available and reliance on observed training data to overcome
shortcomings in prior knowledge.
Purely analytical learning methods offer the advantage of generalizing
more accurately from less data by using prior knowledge to guide learning.
However, they can be misled when given incorrect or insufficient prior
knowledge. Purely inductive methods offer the advantage that they require no
explicit prior knowledge and learn regularities based solely on the training data.
However, they can fail when given insufficient training data, and can be misled by
the implicit inductive bias they must adopt in order to generalize beyond the
observed data.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
INDUCTIVE-ANALYTICAL APPROACHES TO LEARNING:
The Learning Problem
Given:
A set of training examples D, possibly containing errors
A domain theory B, possibly containing errors
A space of candidate hypotheses H
Determine:
A hypothesis that best fits the training examples and domain theory
Solution:
We can define measures of hypothesis error with respect to the data and with
respect to the domain theory.
errorD(h) is defined to be the proportion of examples D that are misclassified by h.
errorB(h) is defined with respect to a domain theory B that h will disagree with B
on the classification of a randomly drawn instance.
The learning problem is to minimize some combined measure of the error
of the hypothesis over the data and the domain theory.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
Three different methods explored for using prior knowledge to alter the search
performed by purely inductive methods.
A. USING PRIOR KNOWLEDGE TO INITIALIZE THE HYPOTHESIS:
A.1. KBANN (Knowledge-Based Artificial Neural Network) algorithm
In this approach the domain theory B is used to construct an initial
hypothesis h0 that is consistent with B. A standard inductive method is then
applied, starting with the initial hypothesis ho. For example, the KBANN
system learns artificial neural networks in this way. It uses prior knowledge
to design the interconnections and weights for an initial network, so that this
initial network is perfectly consistent with the given domain theory. This
initial network hypothesis is then refined inductively using the
BACKPROPAGATlON aIgorithm and available data. Beginning the
search at a hypothesis consistent with the domain theory makes it more
likely that the final output hypothesis will better fit this theory.
ANALYTICAL LEARNING-2
B. USING PRIOR KNOWLEDGE TO ALTER THE SEARCH
OBJECTIVE:
B.1. The TANGENTPROP Algorithm
B.2.The EBNN (Explanation-Based Neural Network learning) algorithm
In this approach, the goal criterion G is modified to require that the
output hypothesis fits the domain theory as well as the training examples.
For example, the EBNN system learns neural networks in this way. Whereas
inductive learning of neural networks performs gradient descent search to
minimize the squared error of the network over the training data, EBNN
performs gradient descent to optimize a different criterion. This modified
criterion includes an additional term that measures the error of the learned
network relative to the domain theory.
C. USING PRIOR KNOWLEDGE TO AUGMENT SEARCH OPERATORS:
C.1. The FOCL Algorithm
In this approach, the set of search operators O is altered by the domain
theory. For example, the FOCL system learns sets of Horn clauses in this way. It is
based on the inductive system FOIL, which conducts a greedy search through the
space of possible Horn clauses, at each step revising its current hypothesis by
adding a single new literal. FOCL uses the domain theory to expand the set of
alternatives available when revising the hypothesis, allowing the addition of
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
multiple literals in a single search step when warranted by the domain theory. In
this way, FOCL allows single-step moves through the hypothesis space that would
correspond to many steps using the original inductive algorithm. These "macro-
moves" can dramatically alter the course of the search, so that the final hypothesis
found consistent with the data is different from the one that would be found using
only the inductive search steps.
DETAILED ANALYSIS OF THREE METHODS:
A.1. KBANN (Knowledge-Based Artificial Neural Network) algorithm
Given:
A set of training examples
A domain theory consisting of nonrecursive, propositional Horn clauses
Determine:
An artificial neural network that fits the training examples, biased by the
domain theory.
The two stages of the KBANN algorithm are first to create an artificial neural
network that perfectly fits the domain theory and second to use the
BACKPROPAGATION algorithm to refine this initial network to fit the training
examples.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
B.1. The TANGENTPROP Algorithm
The approach begins the gradient descent search with a hypothesis that
perfectly fits the domain theory, then perturbs this hypothesis as needed to
maximize the fit to the training data.
An alternative way of using prior knowledge is to incorporate it into the
Error criterion minimized by gradient descent, so that the network must fit a
combined function of the training data and domain theory.
TANGENTPROP algorithm accommodates domain knowledge expressed as
derivatives of the target function with respect to transformations of its inputs.
It is interesting to compare the search through hypothesis space performed by
TANGENTPROP, KBANN, and BACKPROPAGATION. TANGENTPROP
incorporates prior knowledge to influence the hypothesis search by altering the
objective function to be minimized by gradient descent. This corresponds to
altering the goal of the hypothesis space search.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
B.2.The EBNN (Explanation-Based Neural Network learning) algorithm
EBNN Algorithm builds on the TANGENTPROP algorithm in two
significant ways.
First, instead of relying on the user to provide training derivatives, EBNN
computes training derivatives itself for each observed training example. These
training derivatives are calculated by explaining each training example in terms of
a given domain theory, then extracting training derivatives from this explanation.
Second, EBNN addresses the issue of how to weight the relative importance
of the inductive and analytical components of learning.
The inputs to EBNN include
(1) a set of training examples of the form (xi, f (xi)) with no training derivatives
provided, and
(2) a domain theory analogous to that used in explanation-based learning and in
KBANN, but represented by a set of previously trained neural networks
rather than a set of Horn clauses.
The output of EBNN is a new neural network that approximates the target
function f.
This learned network is trained to fit both the training examples (xi,f(xi)) and
training derivatives of f extracted from the domain theory. Fitting the training
examples (xi,f(xi)) constitutes the inductive component of learning, whereas fitting
the training derivatives extracted from the domain theory provides the analytical
component.
The EBNN algorithm uses a domain theory expressed as a set of previously
learned neural networks, together with a set of training examples, to train its output
hypothesis (the target network). For each training example EBNN uses its domain
theory to explain the example, then extracts training derivatives from this
explanation.
EBNN provides TANGENTPROP with derivatives that it calculates from
the domain theory. To see how EBNN calculates these training derivatives,
consider again Figure 12.7.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
Downloaded by CSE DR. RAVI BHUKYA ([email protected])
lOMoARcPSD|52112372
C.1. The FOCL Algorithm
FOCL is an extension of the purely inductive FOIL system. Both FOIL and
FOCL learn a set of first-order Horn clauses to cover the observed training
examples. Both systems employ a sequential covering algorithm that learns a
single Horn clause, removes the positive examples covered by this new Horn
clause, and then iterates this procedure over the remaining training examples.
FOIL generates each candidate specialization by adding a single new literal
to the clause preconditions. FOCL uses this same method for producing candidate
specializations, but also generates additional specializations based on the domain
theory. The solid edges in the search tree of Figure 12.8 show the general-to-
specific search steps considered in a typical search by FOIL. The dashed edge in
the search tree of Figure 12.8 denotes an additional candidate specialization that is
considered by FOCL and based on the domain theory.
Downloaded by CSE DR. RAVI BHUKYA ([email protected])