UNIT-3
Topic-1
INTRODUCTION
● Instance-based learning methods also includes case-based reasoning
methods that use more complex, symbolic representations for
identifying "neighboring" instances.
● Case-based reasoning has been applied to tasks such as storing and
reusing past experience at a help desk, reasoning about legal cases
by referring to previous cases, and solving complex scheduling
problems by reusing relevant portions of previously solved
problems.
5
Topic-2
k-NEAREST NEIGHBOR LEARNING
CS416 Compiler Design
Disadvantage Considering all examples, classifier will run more slowly.
• If all training examples are considered when classifying a new query
instance, called global method.
• If only the nearest training examples are considered, local method.
• When the rule in Equation (8.4) is applied as a global method, using all
training examples, it is known as Shepard's method
15
, as we take the weighted average
it smoothens the impact of isolated noisy training examples
● Curse of dimensionality
● Efficient memory indexing
Issue 1: Curse of dimensionality
● One practical issue in applying k-NEAREST NEIGHBOR algorithms is that
distance between instances is calculated based on all attributes of the
instance.
● i.e., When we have multiple irrelevant attributes calculating distance based
on all attributes of the instance can be misleading.
● Consider applying k-NN to a problem in which each instance is described by
20 attributes, but where only 2 of these attributes are relevant to
determining the classification for the particular target function.
● These 2 relevant attributes may be distant from one another in the 20-
dimensional instance space.
● As a result, the similarity metric used by k-NN-depending on all 20
attributes-will be misleading.
● The distance between neighbors will be dominated by the large number of
irrelevant attributes.
● This difficulty, arises when many irrelevant attributes are present, is
sometimes referred to as the curse of dimensionality.
Solution-1
● Weight each attribute differently when calculating the distance between
two instances.
● i.e, Stretching the axes in the Euclidean space,
○ Shortening the axes that correspond to less relevant attributes,
○ Lengthening the axes that correspond to more relevant attributes.
● The amount by which each axis should be stretched can be determined
automatically using a cross-validation approach.
● This process of stretching the axes in order to optimize the performance
of k-NN provides a mechanism for suppressing the impact of irrelevant
attributes.
Solution-2
● Drastic alternative is completely eliminate the least relevant attributes
from the instance space.
● Use efficient cross-validation methods for selecting relevant subsets of the
attributes for k-NN algorithms.
● Based on leave-one-out cross validation, in which the set of m training
instances is repeatedly divided into a training set of size m - 1 and test set
of size 1, in all possible ways.
● This approach is easily implemented in k-NN algorithms because no
additional training effort is required each time the training set is redefined.
● Both of the above approaches stretch each axis by some constant factor.
Solution-3
● Alternatively, stretch each axis by a value (i.e., redefining its distance
metric) that varies over the instance space.
● But this is less common as it increases the risk of overfitting.
Issue 2: Efficient memory indexing
● K-NN algorithm delays all processing until a new query is received, required
significant computation to process each new query.
Solution:
● Storing all instances during training using indexing so the nearest neighbors
can be identified more efficiently at some additional cost in memory.
● One such indexing method is the kd-tree, instances are stored at the
leaves, with nearby instances stored at the same or nearby nodes.
● The internal nodes of the tree sort the new query xq, to the relevant leaf
by testing selected attributes of xq,.
24
CS416 Compiler Design
A Note on Terminology
Topic-3
LOCALLY WEIGHTED REGRESSION
CS416 Compiler Design
28
Topic-4
RADIAL BASIS FUNCTIONS
● Where each xu is an instance from X, the kernel function Ku(d(xu, x)) is
defined so that it decreases as the distance d(xu, x) increases.
● Here k is a user provided constant that specifies the number of kernel
functions to be included.
● Even though f(x) is a global approximation to f (x), the contribution from
each of the Ku(d (xu, x)) terms is localized to a region nearby the point xu.
Called Gaussian kernel function
● The above functional can approximate any function with arbitrarily small
error, provided a sufficiently large number k of such Gaussian kernels and
provided the width of each kernel can be separately specified.
● The above function can be viewed as describing a two-layer network.
● First layer of units computes the values of the various Ku(d(xu, x))
● Second layer computes a linear combination of these first-layer unit values.
38
● Several alternative methods have been proposed for choosing an
appropriate number of hidden units or, equivalently, kernel functions.
Approach-1
● Allocate a Gaussian kernel function for each training example (xi, f (xi)).
● The RBF network learns a global approximation to the target function
that helps to fit the training data exactly.
● That is, for any set of m training examples the weights wo . . . wm, for
combining the m Gaussian kernel functions can be set so that f(xi) = f(xi)
for each training example (xi, f (xi)).
40
Approach-2
● Choose a set of kernel functions that is smaller than the number of
training examples.
● This approach can be much more efficient than the first approach,
especially when the number of training examples is large.
● The kernel functions can be accomplished using unsupervised clustering
algorithms that fit the training instances to a mixture of Gaussians.
● The EM algorithm provides for choosing the means of a mixture of k
Gaussians to best fit the observed instances.
41
Difference between
RBF and MLP
Topic-5
CASE-BASED REASONING
• CADET is a research prototype system intended to explore the potential role of case-
based reasoning in conceptual design.
• CBR is an instance-based learning method in which instances (cases) may be rich
symbolic relational descriptions and in which the retrieval and combination of cases to
solve the current query may rely on knowledge-based reasoning and search-intensive
problem-solving methods.
Applications
Consider a prototypical example of a case-based reasoning system
● The top half of the figure shows the description of
a typical stored case called a T-junction pipe.
● Function is represented in terms of the qualitative
relationships among the waterflow levels and
temperatures at its inputs and outputs.
● Functional description specifies, an arrow with a "+"
label indicates that the variable at the arrowhead
increases with the variable at its tail.
● For example, the output waterflow Q3 increases
with increasing input waterflow Ql.
● "-" label indicates that the variable at the head
decreases with the variable at the tail. 48
● The bottom half of this figure depicts a new design
problem described by its desired function.
● This particular function describes the required
behavior of one type of water faucet.
Where
Qc, - flow of cold water into the faucet
Qh - flow of hot water into the faucet
Qm - single mixed flow out of the faucet
Tc, Th, and Tm - temperatures of the cold water,
hot water, and mixed water temperature.
Ct - control signal for temperature
Cf - control signal for waterflow
49
New design problem is done by four phases:
1. Retrieve
2. Reuse
• It uses general knowledge to create these elaborated function graphs.
• Eg: can be rewritten as interpreted as if B must
increase with A it is sufficient to find some quantity x such that B
increase with x and x increase with A.
• x - universally quantified variable whose value is bound when matching the
function graph against the case library.
• The process of producing a final solution from multiple retrieved cases
(match different subgraphs), can be very complex.
3. Revise (Merging)
• It may require designing portions from first principles, in addition to merging
retrieved portions from stored cases.
• It may also require backtracking on earlier choices of design subgoals and,
therefore, rejecting cases that were previously retrieved.
4. Retain (Adaption)
• CADET has very limited capabilities for combining and adapting multiple
retrieved cases to form the final design.
• It relies heavily on the user for this adaptation stage of the process.
• CADET is a research prototype system intended to explore the potential
role of case-based reasoning in conceptual design.
• CBR is an instance-based learning method in which instances (cases) may be
rich relational descriptions and in which the retrieval and combination of
cases to solve the current query may rely on knowledge-based reasoning and
search-intensive problem-solving methods.
Topic-6
REMARKS ON LAZY AND
EAGER LEARNING
• Consider three lazy learning methods: k-NN algorithm, locally weighted
regression, and case-based reasoning.
• We call these methods lazy because they defer the decision of how to generalize
beyond the training data until each new query instance is encountered.
• Consider one eager learning method: the method for learning radial basis
function networks.
• We call this method eager because it generalizes beyond the training data
before observing the new query, committing at training time to the network
structure and weights that define its approximation to the target function.
• Every other algorithm (e.g., C4. 5, BACKPROPAGATION) is an eager learning
algorithm.
56
Question-1: Are there important differences in what can be achieved by lazy
versus eager learning?
• Consider differences in computation time and differences in the classifications
produced for new queries.
• For example, lazy methods will generally require less computation during training,
but more computation when they must predict the target value for a new query.
Question-2: Whether there are essential differences in the inductive bias that
can be achieved by lazy versus eager methods?
57
Question-3: Does this distinction affect the generalization accuracy of the
learner?
• It does if we require that the lazy and eager learner employ the same hypothesis
space H.
• A lazy learner has the option of (implicitly) representing the target function by
a combination of many local approximations, whereas an eager learner must
commit at training time to a single global approximation.
58
Question-4: Can we create eager methods that use multiple local approximations to
achieve the same effects as lazy local methods?
• RBF networks achieve this by committing to a global approximation to the target
function at training time.
• RBF network represents this global function as a linear combination of multiple
local kernel functions.
• RBF networks are built eagerly from local approximations centered around the
training examples, or around clusters of training examples, but not around the
unknown future query points.
59