05-04-2018 Dr.
Vijaya Sri Kompalli
ANALYTICAL LEARNING
1
INTRODUCTION
05-04-2018
Inductive learning (Supervised Learning) : Generalize
from observed training examples to be +ve or –ve.
Neural network and Decision tree learning, Inductive
Dr. Vijaya Sri Kompalli
Logic Programming, Genetic Algorithms
Poor when insufficient data.
Some fundamental bounds on the accuracy that can
be achieved when learning inductively
Solution: Willing to reconsider the formulation of the
learning problem
To develop learning algorithms that accept explicit
prior knowledge as an input, in addition to the input
training data.
Explanation-based learning is one such approach. 2
EXPLANATION-BASED LEARNING (EBL)
05-04-2018
It uses prior knowledge to analyze, or explain,
each training example in order to infer which
example features are relevant to the target
Dr. Vijaya Sri Kompalli
function and which are irrelevant.
Uses prior knowledge to reduce the complexity of
the hypothesis space to be searched.
3
EXAMPLE
05-04-2018
Target Concept: Chess
Dr. Vijaya Sri Kompalli
positions in which black
will lose its queen within
two moves.
4
"board positions in which the black king and
queen are simultaneously attacked,“
Heavy explanation or analyzing ability
05-04-2018
"Because white's knight is attacking both the king and
queen, black must move out of check, thereby allowing
Dr. Vijaya Sri Kompalli
the knight to capture the queen."
Rationally generalize
Included in the general hypothesis, once the
explanation matches.
Principle of optimal moves.
Uncovered by specific training examples.
Learning algorithms that learn from such
explanations.
Example: Prolog_EBG 5
05-04-2018
Dr. Vijaya Sri Kompalli
INDUCTIVE LEARNING VS. ANALYTICAL
LEARNING
6
INDUCTIVE LEARNING
05-04-2018
In inductive learning,
the learner is given a hypothesis space H from which
Dr. Vijaya Sri Kompalli
it must select an output hypothesis, and
a set of training examples D = { ( x l ,f ( x ~ ) ). ., .
(x,, f ( x , ) ) } where f (xi )i s the target value for
the instance xi.
The desired output of the learner is a hypothesis
h from H that is consistent with these training
examples.
7
INDUCTIVE LEARNING SPACE
05-04-2018
LEARNER Output
Hypothesis
Dr. Vijaya Sri Kompalli
T1
F1 T2
X1 F2 T3
X2 F3 .
X3 . .
. . Training
.
. . Examples
Tn
Xn Fn D
(x1,f1)
(x2,f2)
. h
.
(xn,fn) 8
H
EXAMPLE: CHESS GAME
05-04-2018
"chessboard positions in which black will lose its
queen within two moves.“
Dr. Vijaya Sri Kompalli
xi would describe a particular chess position
True: xi is a position for
which black will lose
its queen within two moves
f (xi)
False : xi is a position for
which black will not lose
its queen within two moves
9
ANALYTICAL LEARNING
05-04-2018
In analytical learning,
the input to the learner includes the same
hypothesis space H and
Dr. Vijaya Sri Kompalli
training examples D as for inductive learning.
In addition, the learner is provided an additional
input: A domain theory B consisting of background
knowledge that can be used to explain observed
training examples.
The desired output of ,the learner is a hypothesis h
from H that is consistent with both the training
examples D and the domain theory B.
10
ANALYTICAL LEARNING SPACE
Output
05-04-2018
Hypothesis
LEARNER
T1
F1 T2
X1 F2 T3
Dr. Vijaya Sri Kompalli
X2 F3 .
X3 . .
Domain . . .
Theory . . Tn
Xn Fn
(x1,f1)
(x2,f2) Training
“B” . h Examples D
.
(xn,fn)
H 11
B-Observed Results from experts:For a given value of x1, the target is T1 as from
the existing samples it is observed that the sampling maps 70 % of the values.
EXAMPLE: CHESS GAME
05-04-2018
"chessboard positions in which black will lose its
queen within two moves.“
Domain Theory B: Pre and well defined Legal
Dr. Vijaya Sri Kompalli
Moves of Chess.
xi would describe a particular chess position
True: xi is a position for
which black will lose
its queen within two moves
f (xi)
False : xi is a position for
which black will not lose
12
its queen within two moves
*B does not entail the negation of h
ANALYTICAL EXAMPLE: ROBOT SORTING
VARIOUS PHYSICAL OBJECTS
05-04-2018
Consider an instance space X in which each instance is a
pair of physical objects.
Each of the two physical objects in the instance is described
Dr. Vijaya Sri Kompalli
by the predicates Color, Volume, Owner, Material,
Type, and Density,
The relationship between the two objects is described
by the predicate On.
Given this instance space, the task is to learn the
target concept "pairs of physical objects, such that one can
be stacked safely on the other," denoted by the predicate
SafeToStack(x,y).
Learning this target concept might be useful, for
example, to a robot system that has the task of storing
13
various physical objects within a limited workspace.
05-04-2018 Dr. Vijaya Sri Kompalli
14
LEARNING WITH PERFECT
DOMAIN THEORIES: PROLOG-EBG
05-04-2018
A domain theory is said to be
correct if each of its assertions is a truthful
statement about the world.
Dr. Vijaya Sri Kompalli
complete with respect to a given target concept and
instance space, if the domain theory covers every
positive example in the instance space.
After all, if the learner had a perfect domain
theory, why would it need to learn? There are two
responses to this question.
15
REASONS
05-04-2018
Although it is quite easy to write down the legal
moves of chess that constitute this domain
theory, it is extremely difficult to write down the
Dr. Vijaya Sri Kompalli
optimal chess-playing strategy.
It is difficult to write a perfectly correct and
complete theory even for our relatively simple
SafeToStack problem.
A more realistic assumption is that plausible
explanations based on imperfect domain theories
must be used, rather than exact proofs based on
perfect knowledge.
16
EXPLANATION-BASED LEARNING
ALGORITHMS : PROLOG-EBG
05-04-2018
PROLOG-EBG: Kedar-Cabelli and McCarty 1987
PROLOG-EBGis a sequential covering algorithm
Dr. Vijaya Sri Kompalli
It operates by learning a single Horn clause rule,
removing the positive training examples covered
by this rule, then iterating this process on the
remaining positive examples until no further
positive examples remain uncovered.
When given a complete and correct domain
theory, PROLOG-EBG is guaranteed to output a
hypothesis (set of rules) that is itself correct and
that covers the observed positive training
17
examples.
ILLUSTRATIVE TRACE
05-04-2018
PROLOG-EBG algorithm is a sequential covering
algorithm that considers the training data
incrementally.
Dr. Vijaya Sri Kompalli
For each new positive training example that is
not yet covered by a learned Horn clause, it forms
a new Horn clause by:
(1) explaining the new positive training example,
(2) analyzing this explanation to determine an
appropriate generalization, and
(3) refining the current hypothesis by adding a
new Horn clause rule to cover this positive
18
example, as well as other similar instances
05-04-2018 Dr. Vijaya Sri Kompalli
19
1. EXPLAIN THE TRAINING
EXAMPLE
05-04-2018
When the domain theory is correct and complete
this explanation constitutes a proof that the
training example satisfies the target
Dr. Vijaya Sri Kompalli
concept.
When dealing with imperfect prior knowledge,
the notion of explanation must be extended to
allow for plausible, approximate arguments
rather than perfect proofs.
20
05-04-2018 Dr. Vijaya Sri Kompalli
21
2. ANALYZE THE EXPLANATION
05-04-2018
"of the many features that happen to be true of
the current training example, which ones are
generally relevant to the target concept?‘’
Dr. Vijaya Sri Kompalli
By collecting just the features mentioned in the
leaf nodes of the explanation and substituting
variables x and y for Objl and Obj2, we can
form a general rule that is justified by the
domain theory:
SafeToStack(x, y) <- Volume(x, 2) ^
Density(x, 0.3) ^ Type(y, Endtable)
22
05-04-2018
PROLOG-EBG computes the most general rule that
can be justified by the explanation, by computing the
weakest preimage of the explanation
Dr. Vijaya Sri Kompalli
Definition: The weakest preimage of a conclusion
C with respect to a proof P is the most general set of
initial assertions A, such that A entails C according to
P.
This more general rule does not require the specific
values for Volume and Density that were required by
the first rule.
Instead, it states a more general constraint on the
23
values of these attributes
05-04-2018
PROLOG-EBG computes the weakest preimage of the
target concept with respect to the explanation, using a
general procedure called regression.
Dr. Vijaya Sri Kompalli
The regression procedure operates on a domain theory
represented by an arbitrary set of Horn clauses.
It works iteratively backward through the explanation,
First computing the weakest preimage of the target concept
with respect to the final proof step in the explanation, then
computing the weakest preimage of the resulting
expressions with respect to the preceding step, and so on.
The procedure terminates when it has iterated over all
steps :in the explanation, yielding the weakest precondition
of the target concept with respect to the literals at the leaf
nodes of the explanation.
24
05-04-2018 Dr. Vijaya Sri Kompalli
25
REGRESSION
3. REFINE THE CURRENT
HYPOTHESIS
05-04-2018
The current hypothesis at each stage consists of
the set of Horn clauses learned thus far.
At each stage, the sequential covering
Dr. Vijaya Sri Kompalli
algorithm picks a new positive example that is
not yet covered by the current Horn clauses, explains
this new example, and formulates a new rule
Only positive examples are covered in the algorithm
as we have defined it, and
The learned set of Horn clause rules predicts only
positive examples.
A new instance is classified as negative if the current
rules fail to predict that it is positive.
This is in keeping with the standard negation-as-
failure approach used in Horn clause inference
systems such as PROLOG. 26
REMARKS ON EXPLANATION-
BASED LEARNING
05-04-2018
Unlike inductive methods, PROLOG-EBG produces justified
general hypotheses by using prior knowledge to analyze individual
examples.
The explanation of how the example satisfies the target
Dr. Vijaya Sri Kompalli
concept determines which example attributes are relevant: those
mentioned by the explanation.
The further analysis of the explanation, regressing the target
concept to determine its weakest preimage with respect to the
explanation, allows deriving more general constraints on the values of
the relevant features.
Each learned Horn clause corresponds to a sufficient condition
for satisfying the target concept. The set of learned Horn clauses
covers the positive training examples encountered by the learner, as
well as other instances that share the same explanations.
The generality of the learned Horn clauses will depend on the
formulation of the domain theory and on the sequence in which
training examples are considered.
PROLOG-EBG implicitly assumes that the domain theory is
correct and complete. If the domain theory is incorrect or
incomplete, the resulting learned concept may also be incorrect. 27
CAPABILITIES AND LIMITATIONS
05-04-2018
EBL as theory-guided generalization of
examples EBL uses its given domain theory to
generalize rationally from examples based on
Dr. Vijaya Sri Kompalli
relevance.
EBL as example-guided reformulation of
theories. The PROLOG-EBG algorithm can be
viewed as a method for reformulating the domain
theory into a more operational form by forming rules
to classify.
EBL as "just" restating what the learner already
"knows. " i.e., if its initial domain theory is sufficient
to explain any observed training examples, then it is
also sufficient to predict their classification in
advance. 28
KNOWLEDGE COMPILATION
05-04-2018
In its pure form EBL involves reformulating the
domain theory to produce general rules that
classify examples in a single inference step.
Dr. Vijaya Sri Kompalli
This kind of knowledge reformulation is
sometimes referred to as knowledge
compilation, indicating that the transformation
is an efficiency improving one that does not alter
the correctness of the system's knowledge.
29
1. DISCOVERING NEW FEATURES
05-04-2018
One interesting capability of PROLOG-EBG is its
ability to formulate new features that are not
explicit in the description of the training
Dr. Vijaya Sri Kompalli
examples, but that are needed to describe the
general rule underlying the training example.
In particular, the learned rule asserts that the
essential constraint on the Volume and Density
of x is that their product is less than 5.
In fact, the training examples contain no
description of such a product, or of the value it
should take on. Instead, this constraint is
formulated automatically by the learner. 30
COMPARISON WITH NN
05-04-2018
EBG
NN
Feature is one of a very
large set of potential “Feature" is similar in
Dr. Vijaya Sri Kompalli
features that can be kind to the types of
computed from the features represented by
available instance the hidden units of
attributes neural networks;
PROLOG-EBG Fits the features based
automatically formulates on training data in
such features in its attempt
to fit the training data
Backpropagation
Statistical process derives
PROLOG-EBG employs an
hidden unit features in
analytical process to derive neural networks from
new features based on many training examples
analysis of single training
31
examples.
2. DEDUCTIVE LEARNING
05-04-2018
PROLOG-EBGis a deductive, rather than
inductive, learning process.
Dr. Vijaya Sri Kompalli
That is, by calculating the weakest preimage of
the explanation it produces a hypothesis h that
follows deductively from the domain theory B,
while covering the training data D.
To be more precise, PROLOG-EBGo utputs a
hypothesis h that satisfies the following two
constraints:
32
05-04-2018
Earlier equation states the type of knowledge
that is required by PROLOG-EBG for its domain
theory.
Dr. Vijaya Sri Kompalli
In particular, PROLOG-EBG assumes the
domain theory B entails the classifications of the
instances in the training data:
This constraint on the domain theory B assures
that an explanation can be constructed for each
positive example.
33
ILP VS EBG
Inductive Logic Prolog-EBG:
Programming Explanation Based
Learning
05-04-2018
It is an inductive learning
task EBL is deductive
Background knowledge B' learning task
is provided to the learner. Domain Theory B is
Dr. Vijaya Sri Kompalli
It does not typically provided to the learner.
satisfy the constraint It satisfies the
given by Equation classification equation
ILP uses its background PROLOG-EBG uses its
knowledge B' to enlarge domain theory B to
the set of hypotheses to be reduce the set of
considered acceptable hypotheses.
ILP systems output a PROLOG-EBG outputs
hypothesis h that satisfies a hypothesis h that
the following constraint: satisfies the following
two constraints: 34
3.INDUCTIVE BIAS IN EXPLANATION-
BASED LEARNING
05-04-2018
Inductive bias of a learning algorithm is a set of
assertions that, together with the training
examples, deductively entail subsequent
Dr. Vijaya Sri Kompalli
predictions made by the learner.
The importance of inductive bias is that it
characterizes how the learner generalizes beyond
the observed training examples.
In PROLOG-EBG the output hypothesis h follows
deductively from D^B
Therefore, the domain theory B is a set of
assertions which, together with the training
35
examples, entail the output hypothesis.
05-04-2018
PROLOG-EBGem ploys a sequential covering algorithm
that continues to formulate additional Horn clauses until
all positive training examples have been covered
Dr. Vijaya Sri Kompalli
Approximate inductive bias of PROLOG-EBGT: he
domain theory B, plus a preference for small sets of
maximally general Horn clauses.
The inductive bias is a fixed property of the learning
algorithm, typically determined by the syntax of its
hypothesis representation
Therefore, any attempt to develop a general-purpose
learning method must at minimum allow the inductive bias
to vary with the learning problem at hand.
On a more practical level, in many tasks it is quite natural
to input domain specific knowledge (e.g., the knowledge
about Weight in the SafeToStack example) to influence 36
how the learner will generalize beyond the training data.
KNOWLEDGE LEVEL LEARNING (HYPOTHESES
THAT ARE NOT ENTAILED BY THE DOMAIN
THEORY ALONE)
05-04-2018
LE MMA-ENUMERATOR is an algorithm simply
enumerates all proof trees that conclude the
target concept based on assertions in the domain
Dr. Vijaya Sri Kompalli
theory B.
For each such proof tree, LEMMA-
ENUMERATOR calculates the weakest preimage
and constructs a Horn clause, in the same
fashion as PROLOG-EBG.
The only difference between LEMMA-
ENUMERATOR and PR OLOG-EBG is that
LEMMAENUMERATOR ignores the training
data and enumerates all proof trees. 37
05-04-2018
"If Ross likes to play tennis when the humidity is
x, then he will also like to play tennis when the
humidity is lower than x,“
Dr. Vijaya Sri Kompalli
domain theory does not entail any conclusions
regarding which instances are positive or
negative instances of PlayTennis
38
05-04-2018
The phrase knowledge-level learning is
sometimes used to refer to this type of learning,
in which the learned hypothesis entails
Dr. Vijaya Sri Kompalli
predictions that go beyond those entailed by the
domain theory.
The set of all predictions entailed by a set of
assertions Y is often called the deductive
closure of Y.
The key distinction here is that in
knowledge-level learning the deductive
closure of B is a proper subset of the deductive
closure of B + h. 39
05-04-2018
A second example of knowledge-level analytical
learning is provided by considering a type of
assertions known as determinations
Determinations assert that some attribute of the instance
Dr. Vijaya Sri Kompalli
is fully determined by certain other attributes, without
specifying the exact nature of the dependence.
"people who speak Portuguese," and imagine we are
given as a domain theory the single determination
assertion
"the language spoken by a person is determined by
their nationality."
Taken alone, this domain theory does not enable us to
classify any instances as positive or negative.
However, if we observe that "Joe, a 23-year-old left-handed
Brazilian, speaks Portuguese,"
then we can conclude from this positive example and the 40
domain theory that "all Brazilians speak Portuguese."
EXPLANATION-BASED LEARNING OF
SEARCH CONTROL KNOWLEDGE
05-04-2018
Exactly how should we formulate the problem of learning search control so that
we can apply explanation-based learning?
One system that employs explanation-based learning to improve its search is
PRODIGY(C arbonell et al. 1990).
Dr. Vijaya Sri Kompalli
PRODIGYis a domain-independent planning system that accepts the definition
of a problem domain in terms of the state space S and operators 0.
It then solves problems of the form "find a sequence of operators that
leads from initial state si to a state that satisfies goal predicate G."
PRODIGuYse s a means-ends planner that decomposes problems into
subgoals, solves them, then combines their solutions into a solution for the full
problem.
Thus, during its search for problem solutions PRODIGYre peatedly faces
questions such as
"Which subgoal should be solved next?'and
"Which operator should be considered for solving this subgoal?'
Minton (1988) describes the integration of explanation-based learning into PRODIGYb y
defining a set of target concepts
appropriate for these kinds of control decisions that it repeatedly confronts.
For example, one target concept is "the set of states in which subgoal A should
be solved before subgoal B."
41
05-04-2018
Dr. Vijaya Sri Kompalli
SOAR supports a broad variety of problem-solving strategies
that subsumes PRODIGYm'Se ans-ends planning strategy.
SOAR learns by explaining situations in which its current search
strategy leads to inefficiencies.
When it encounters a search choice for which it does not have a
definite answer (e.g., which operator to apply next) SOAR reflects
on this search impasse, using weak methods such as generate-
and-test to determine the correct course of action.
The reasoning used to resolve this impasse can be interpreted
as an explanation for how to resolve similar impasses in the future.
SOAR uses a variant of explanation-based learning called chunking
to extract the general conditions under which the same
explanation applies.
SOAR has been applied in a great number of problem domains
and has also been proposed as a psychologically plausible model of
human learning processes 42