Basics of Learning Theory
Chapter - 3
Designing a Learning System
■ Choosing the Training Experience
■ Choosing the Target Function
■ Choosing a Representation for the Target Function
■ Choosing a Function Approximation Algorithm
– Estimating training values
– Adjusting the weights
■ The Final Design
2
Designing a learning system
1. Choosing the Training Experience
The system learns from the training experience we choose.
Attributes to consider to choosing training experience
■ Direct and indirect training example
■ Training example also depends on the presence of a supervisor
who can label all valid moves for a board state, in the absence
of a supervisor, the game agent plays against itself and learns
the good moves
■ If the training samples and testing samples have the same
distribution, the results would be good
3
2. Choosing the target function
Determines exactly what type of knowledge will be learned
■ In direct experience, a board move is selected & determined whether
it is good against all others
Consider checkers game, Let ChooseMove be the target function:
ChooseMove: B→M
■ In indirect, all legal moves are accepted and score is generated while
largest being the best
Evaluation function V assigns a numerical score to any given board state
V: B→R
If b is final board state that is won v(b)=100
If b is final board state that is lost v(b)= -100
If b is final board state that is drawn v(b)= 0
4
3. Choosing the representation for the target function
The representation of knowledge may be a table, collection of rules or a neural
network
Example representation: Let
■ xl: the number of black pieces on the board
■ x2: the number of red pieces on the board
■ x3: the number of black kings on the board
■ x4: the number of red kings on the board
■ x5: the number of black pieces threatened by red (i.e., which can be captured on
red's next turn)
■ x6: the number of red pieces threatened by black
Thus, the learning program will represent this as a linear function of the form
5
4. Choosing a Function Approximation Algorithm
■ To learn the target function f we require a set of training
examples, each describing a specific board state b and the
training value Vtrain (b) for b.
■ Each training example is an ordered pair of the form (b,
Vtrain(b)).
■ Function Approximation Procedure
– Derive training examples from the indirect training
experience available to the learner
– Adjusts the weights wi to best fit these training examples
6
Adjusting the weights
■ Specify the learning algorithm for choosing the weights wi to
best fit the set of training examples {(b, Vtrain(b))}
■ The weights has to be chosen such that it minimizes the error E
between training and predicted values.
– For each training example (b, Vtrain(b))
– Use the current weights to calculate (b)
– For each weight wi, update it as
7
5.Final Design
■ The final design of checkers learning system can be described
by four distinct program modules
8
5.Final Design…….
■ The Performance System is the module that must solve the
given performance task by using the learned target function(s).
■ The Critic takes as input the history or trace of the game and
produces as output a set of training examples of the target
function.
■ The Generalizer takes as input the training examples and
produces an output hypothesis that is its estimate of the target
function.
■ The Experiment Generator takes as input the current
hypothesis and outputs a new problem (i.e., initial board state)
for the Performance System to explore.
9
Introduction to Concept Learning
• Concept learning - a learning task in which a human or machine learner is trained
to classify objects by being shown a set of example objects along with their class
labels. The learner simplifies what has been observed by condensing it in the form
of an example.
• Concept learning - also known as category learning, concept attainment, and
concept formation.
• A Formal Definition for Concept Learning: Given a set of hypotheses, the learner
searches through the hypotheses space to identify the best hypothesis that
matches the target concept.
A Concept Learning Task –
EnjoySport Training Examples
■ A set of example days, and each is described by six attributes.
■ The task is to learn to predict the value of EnjoySport for arbitrary day, based on the
values of its attribute values . This id Target concept
Hypothesis Representation
■ Goal: To infer the “best” concept-description from the set of
all possible hypotheses.
■ Each hypothesis consists of a conjunction of constraints on the
instance attributes.
■ Each hypothesis will be a vector of six constraints, specifying
the values of the six attributes
(Sky, AirTemp, Humidity, Wind, Water, and Forecast)
Hypothesis Representation…..
■ Each attribute will be:
■ ?- indicating any value is acceptable for the attribute (don’t care)
■ single value – specifying a single required value, Ex. Warm (specific)
■ Ø -indicating no value is acceptable for the attribute (no value)
■ A hypothesis:
■ Sky AirTemp Humidity Wind Water Forecast
■ < Sunny, ?, ?, Strong , ?, Same >
EnjoySport Concept Learning Task
■ Instances X: Set of all Possible days, each described by the attributes
• Sky (Sunny, Cloudy, and Rainy)
• Temp (Warm and Cold)
• Humidity (Normal and High)
• Wind (Strong and Weak)
• Water (Warm and Cool)
• Forecast (Same and Change)
■ Target Concept (Function) c: EnjoySport : X →{0,1}
■ Hypotheses H : Each hypothesis is described by a conjunction of
constraints on the attributes.
■ Training Examples D : positive and negative examples of the target
function {x1,c(x1)}, {x2, c(x2)}, …….{xn, c(xn)}
EnjoySport Concept Learning Task….
■ Determine :A hypothesis h in H such that h(x) = c(x) for all x in D.
■ Members of the concept (instances for which c(x)=1) are called
positive examples.
■ Nonmembers of the concept (instances for which c(x)=0) are called
negative examples.
■ H represents the set of all possible hypotheses. H is determined by
the human designer’s choice of a hypothesis representation.
■ The goal of concept-learning is to find a hypothesis
h: X → {0, 1} such that h(x)=c(x) for all x in D.
Enjoy Sport - Hypothesis Space
■ Sky has 3 possible values, and other 5 attributes have 2 possible values.
■ There are 96 (= 3.2.2.2.2.2) distinct instances in X.
■ There are 5120 (=5.4.4.4.4.4) syntactically distinct hypotheses in H.
■ – Two more values for attributes:? and 0
■ Hypothesis containing one or more 0 symbols represents the empty set
of instances, that is, it classifies every instance as negative.
■ Hence, there are 973 (= 1 + 4.3.3.3.3.3) semantically distinct hypotheses
in H. considering only ? and one hypothesis representing empty set of
instances.
Example - 2
Sl Horns Tail Tusks Paws Fur Color Hooves Size Elephant
no
1 No Short Yes No No Black No Big Yes
2 Yes Short No No No Brown Yes Medium No
3 No Short Yes No No Black No Medium Yes
4 No Long No Yes Yes White No Medium No
5 No Short Yes Yes Yes Black No Big Yes
■ Distinct intances:2x2x2x2x2x3x2x2=384
■ Distinct hypothesis: 4*4*4*4*4*5*4*4 = 81,920
■ Eliminating the specific hypothesis, Distinct hypothesis: 3*3*3*3*3*4*3*3+1 = 8,749
Concept Learning As Search: General-
to- Specific Ordering of Hypotheses
■ Definition: Let hj and hk be boolean valued functions defined over X.
Then hj is more-general-than-or-equal-to hk if and only if
for all x in X, [(hk(x) = 1) → (hj (x)=1)]
Example:h1 = <Sunny, ?, ?, Strong, ?, ?>
h2 = <Sunny, ?, ?, ?, ?, ?>
Every instance that are classified as positive by h1 will also be classified as
positive by h2 in our example data set. Therefore h2 is more general than
h1.
Find –S Algorithm :
Finding a Maximally Specific Hypothesis Learning Algorithm
Find S algorithm example
Find Find-S Specific Hypothesis for the
following dataset:
CGPA Interactiveness Practical Communication Logical Interest Job offer
Knowledge Skills Thinking
>9 Yes Excellent Good Fast Yes Yes
>9 Yes Good Good Fast Yes Yes
>8 No Good Good Fast No No
>9 Yes Good Good Slow No Yes
Version Spaces
■ Definition: A hypothesis h is consistent with a set of training
examples D iff h(x) = c(x) for each example {x, c(x) } in D.
■ The Candidate-Elimination algorithm represents the set of all
hypotheses consistent with the observed training examples.
■ This subset of all hypotheses is called the version space with
respect to the hypothesis space H and the training examples D,
because it contains all plausible versions of the target concept.
A Compact Representation for Version Space
■ Version space can be represented by its most specific and
most general boundaries.
■ Definition: The general boundary G, with respect to
hypothesis space H and training data D, is the set of maximally
general members of H consistent with D.
■ Definition: The specific boundary S, with respect to hypothesis
space H and training data D, is the set of minimally general
(i.e., maximally specific) members of H consistent with D.
List-Then-Eliminate algorithm
Version space as list of hypotheses
1. VersionSpace a list containing every hypothesis in H
2. For each training example, {x, c(x)}
Remove from Version Space any hypothesis h for which h(x) c(x)
3. Output the list of hypotheses in VersionSpace
Candidate Elimination Algorithm
❖ For each training example d, do
✓ If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
– Remove s from S
– Add to S all minimal generalizations h of s such that
o h is consistent with d, and some member of G is
more general than h
– Remove from S any hypothesis that is more general than
another hypothesis in S
✓ If d is a negative example
•Remove from S any hypothesis inconsistent with d
•For each hypothesis g in G that is not consistent with d
– Remove g from G
– Add to G all minimal specializations h of g such that
o h is consistent with d, and some member of S is
more specific than h
– Remove from G any hypothesis that is less general than
another hypothesis in G
■ S minimally general hypotheses in H, G
maximally general hypotheses in H Initially any
hypothesis is still possible
■ S0 = , , , , ,
■ G0 = ?, ?, ?, ?, ?, ?
➢ Initialize G to the set of maximally general hypotheses in H
➢ Initialize S to the set of maximally specific hypotheses in H
Initial Values
■ S0 : , , , , .
■ G0 : ?, ?, ?, ?, ?, ?
Example:
after seeing Sunny, Warm, Normal, Strong, Warm, Same +
■ S0: , , , , .
■ S1: Sunny, Warm, Normal, Strong, Warm, Same
■ G1, G2 : ?, ?, ?, ?, ?, ?
Example:
after seeing Rainy, Cold, High, Strong, Warm, Change −
■ S2, S3: Sunny, Warm, ?, Strong, Warm, Same
■ G3: Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?, ?, ?, ?, ?, Same
■ G2: ?, ?, ?, ?, ?, ?, ?
Example:
after seeing Sunny, Warm, High, Strong, Cool Change +
■ S3 : Sunny, Warm, ?, Strong, Warm, Same
■ S4 : Sunny, Warm, ?, Strong, ?, ?
■ G4: Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ?
■ G3: Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?, ?, ?, ?, ?, Same
✓ The S boundary of the version space forms a summary of the
previously encountered positive examples that can be used
to determine whether any given hypothesis is consistent with
these examples.
✓ The G boundary summarizes the information from previously
encountered negative examples. Any hypothesis more
specific than G is assured to be consistent with past negative
examples.
Learned Version Space
Remarks on C-E algorithm
■ The learned Version Space correctly describes the target
concept, provided:
– There are no errors in the training examples
– There is some hypothesis that correctly describes the
target concept
■ If S and G converge to a single hypothesis then
concept is exactly learned
■ In case of errors in the training, useful hypothesis are
discarded, no recovery possible
■ An empty version space means no hypothesis in H is
consistent with training examples
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
42
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G0 = <?,?,?,?,?>
■ S0 = <
■ 1. Positive Example
■ <Japan, Honda Blue, 1980, Economy>
■ G1 = <?,?,?,?,?>
■ S1 = <Japan, Honda Blue, 1980, Economy>
43
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G1 = <?,?,?,?,?>
■ S1 = <Japan, Honda Blue, 1980, Economy>
■ 2. Negative Example
■ <Japan, Toyota, Green, 1970, Sports>
■ G2 = {<?,Honda,?,?,?> , <?,?,Blue,?,?>, <?,?,?, 1980,?>,
<?,?,?,?,Economy>}
■ S2 = <Japan, Honda Blue, 1980, Economy>
44
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G2 = {<?,Honda,?,?,?> , <?,?,Blue,?,?>, <?,?,?, 1980,?>,
<?,?,?,?,Economy>}
■ S2 = <Japan, Honda Blue, 1980, Economy>
■ 3. Positve Example
■ <Japan, Toyota, Blue, 1990, Economy>
■ G3 = {<?,?,Blue,?,?>, <?,?,?,?,Economy>}
■ S3= <Japan, ?, Blue, ? , Economy>
45
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G3 = {<?,?,Blue,?,?>, <?,?,?,?,Economy>}
■ S3= <Japan, ?, Blue, ? , Economy>
■ 4. Negative Example
■ <USA, Crysler, Red, 1980, Economy>
■ G4 = {<?,?,Blue,?,?>, <Japan,?,?,?,Economy>}
■ S4= <Japan, ?, Blue, ? , Economy>
46
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G4 = {<?,?,Blue,?,?>, <Japan,?,?,?,Economy>}
■ S4= <Japan, ?, Blue, ? , Economy>
■ 5. Positive Example
■ <Japan, Honda, White, 1980, Economy>
■ G5 = <Japan,?,?,?,Economy>
■ S5 = <Japan,?, ?, ?,Economy>
47
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ 6. Positive Example
■ <Japan, Toyota, Green, 1980, Economy>
■ G6 = <Japan,?,?,?,Economy>
■ S6 = <Japan,?, ?, ?,Economy>
■ 7. Negative Example
■ < Japan, Honda, Red, 1990, Economy>
■ Example is inconsistent with VS. VS collapses. No conjunctive
hypothesis is consistent.
48
Modelling in Machine Learning
■ Modelling is the process of training machine learning algorithm with training
data, tuning it to increase the performance, validating it and making
predictions for new unseen data.
■ Goal of machine learning algorithm is to learn model parameters and
hyperparameters.
– Model parameters: can be learned with training data
– Hyper parameters : high level parameters
■ Evaluating model with Mean square error, accuracy
Model Selection and Model Evaluation
■ Assess Model Performance and Model Complexity to select best model
■ Approaches to select Machine Learning Model
■ Use Resample Methods
■ Measure Accuracy
■ Use a probabilistic framework and quantification of performance
Model Selection
■ Resampling Methods
– Resampling is a method that involves repeatedly drawing samples from the training dataset.
These samples are then used to reconstruct a specific model to retrieve more information
about the fitted model.
■ K fold cross validation
– K-Fold Cross Validation to split the data more effectively. In this process, the data is divided
into k equal sets, where one set is defined as the test set whilst the rest of the sets are used
in training the model.
– The process will continue till each set has acted as the test set and all the sets have gone
through the training phase.
■ Holdout
– The holdout method is a basic cross-validation technique used in machine learning to
evaluate and select models. It involves splitting the data into two parts: a training set and a
test set.
– The training set is used to prepare and improve the model.
– The test set is used to evaluate how well the model works on new data.
■ Stratified k fold cross validation
– This cross-validation object is a variation of KFold that returns stratified folds. The folds are
made by preserving the percentage of samples for each class.
■ Leave-one-out cross validation (LOOCV)
– is a special case of k-fold cross-validation. In LOOCV, the value of k is set to the number
of examples in the dataset. This means that the function approximator is trained on all
the data except for one point and a prediction is made for that point
Model Performance
Contingency Table or Confusion Matrix:
Model Performance
Sensitivity is also called Recall
Precision is the positive predicted value
Model Performance
Visual Classifier Performance
Receiver Operating Characteristics (ROC)
Y-axis = TPR = Sensitivity
X-axis = FPR = 1-Specificity
Model Performance
Visual Classifier Performance
Area under the Curve (AUC)
Y-axis = TPR = Sensitivity
X-axis = FPR = 1-Specificity