Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views3 pages

Chapter 18

Chapter 18 discusses learning from examples, defining learning as an agent's ability to improve performance based on past experiences. It outlines the necessity of learning in unpredictable and changing environments, and introduces various forms and types of learning, particularly focusing on supervised learning and decision trees. The chapter emphasizes the importance of generalization, the structure of decision trees, methods for selecting attributes, and the challenges of overfitting.

Uploaded by

p20241008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Chapter 18

Chapter 18 discusses learning from examples, defining learning as an agent's ability to improve performance based on past experiences. It outlines the necessity of learning in unpredictable and changing environments, and introduces various forms and types of learning, particularly focusing on supervised learning and decision trees. The chapter emphasizes the importance of generalization, the structure of decision trees, methods for selecting attributes, and the challenges of overfitting.

Uploaded by

p20241008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Chapter 18: Learning from Examples – Summary

Definition of Learning: An agent learns if it improves performance on future tasks based on


past experience. Focus is on learning a function from input–output pairs that generalizes to
new inputs.

Why Learning is Needed:

1. Unpredictable situations – designers can't pre-program every scenario.


2. Changing environments – world conditions change (e.g., weather, markets).
3. Unknown solutions – some tasks (e.g., vision) are hard to program by hand.

Components That Can Be Learned:

 Condition-action rules
 Perception interpretation
 World dynamics
 Utility functions
 Action-value functions
 Goals

Key Factors Affecting Learning:

 What component is being learned


 What prior knowledge is available
 How knowledge/data is represented
 What feedback is available

Forms of Learning:

 Inductive: Learn general rules from specific examples.


 Deductive: Derive conclusions from known facts and logic (covered later).

Types of Learning:

 Supervised Learning: Learn from labeled input-output pairs.


 Unsupervised Learning: No labels; discover structure (e.g., clustering).
 Reinforcement Learning: Learn from rewards and punishments.
 Semi-supervised Learning: Few labels, many unlabeled examples; labels may be
noisy.

Section 18.2 – Supervised Learning

 Supervised learning is about learning a function from a set of input-output examples.


Given a training set of examples (x₁, y₁), ..., (xₙ, yₙ), where each y was produced by
an unknown function f(x), the goal is to find a hypothesis function h(x) that
generalizes well, i.e., performs well on unseen inputs.
 Supervised learning can be viewed as a search through a space of hypotheses H for
one that best fits the data. This search is based on minimizing error on the training
examples, but the ultimate goal is to minimize error on new inputs (generalization).
 The output y may be categorical (classification) or numerical (regression). For
example, predicting weather categories is classification, whereas predicting
temperature is regression.
 The training process involves selecting a hypothesis consistent with the data.
However, a highly complex hypothesis may overfit—matching the training data
exactly but performing poorly on new data. This is illustrated in Figure 18.1: a simple
linear model may generalize better than a complex polynomial that fits all training
points.
 Occam’s Razor is a key principle: prefer simpler hypotheses, assuming they fit the
data. A problem is said to be realizable if the true function f lies within the hypothesis
space H.
 The Bayesian approach assigns probabilities to each hypothesis. Using Bayes' rule:
h* = argmax P(h | data) = argmax P(data | h) * P(h),
we can choose the most probable hypothesis based on prior beliefs and how well the
hypothesis explains the data.
 Although it is theoretically possible to use the space of all possible functions (e.g., all
Turing machines), this is computationally infeasible. Simple hypothesis spaces (e.g.,
decision trees, polynomials) are easier to search and more practical.

18.3.1 Decision Tree Representation – Summary

A decision tree represents a function that maps input attribute values to a decision (output
value). It is typically used for classification tasks, especially Boolean classification problems.

Structure of a Decision Tree:

 Internal nodes represent attribute tests.


 Branches represent outcomes of the test.
 Leaf nodes represent final decisions (e.g., Yes or No).
 A decision tree is readable and interpretable.

Expressiveness:

 Decision trees can represent any Boolean function.


 Each root-to-leaf path corresponds to a conjunction of tests.
 The whole tree corresponds to a disjunction of such conjunctions (Disjunctive Normal
Form).
 However, some functions (e.g., Majority function) may require very large trees.

Learning Decision Trees:

 Input: Training data with attributes and labels.


 Goal: Construct a tree that classifies most or all training examples correctly.
 Strategy: Greedy, top-down, recursive splitting of the dataset based on the most
informative attribute.

Choosing the Best Attribute – Information Gain:


 Entropy is used to measure uncertainty.
 Information Gain is defined as:

Gain(A) = Entropy(before split) – Expected Entropy(after split on A)

 The attribute with the highest information gain is selected for splitting at each node.

Overfitting and Pruning:

 Overfitting occurs when the tree is too complex and fits noise in the training data.
 To avoid this, the tree can be pruned by removing branches that do not improve
classification accuracy on a validation set.
 The text mentions statistical tests (like the chi-squared test) as one way to decide
whether a split is useful, but does not go into detail.

Learning Curve:

 Accuracy on test data improves with more training data.


 This pattern is represented by a learning curve.

Early Stopping – A Caution:

 Stopping tree growth too early (when no good split is available) can miss important
patterns.
 It is better to grow the full tree and prune afterward.

Extensions to Decision Trees:


To make decision trees practical in real-world scenarios, we must handle:

1. Missing attribute values.


2. Attributes with many distinct values (use gain ratio instead of raw information gain).
3. Continuous-valued attributes (e.g., weight > 160).
4. Continuous output variables (handled by regression trees instead of classification
trees).

Advantages of Decision Trees:

 Easy to interpret and explain.


 Commonly used in many domains like healthcare and finance.
 Useful when decision-making must be explainable (e.g., in legal or regulatory
contexts).

You might also like