Module 1
i) VAPNIK-CHERVONENKIS (VC) DIMENSION,
ii) PROBABLY APPROXIMATELY CORRECT LEARNING (PAC)
III) MODEL SELECTION AND GENERALIZATION
Vapnik-Chervonenkis dimension(1971)
Measures the complexity of the space H.
Not count hk
It counts the no of distinct instances of X that can be completely
discriminated by H (shattering)
VC DIMENSION
Dichotomy: 2-class problems
X: elements € class 0 or 1
H: any straight line in 2D
Linear Classifier with two data points
Linear Classifier with three data
points
Shattering
Linear Classifier with four data points
Rectangle Classifier
Rectangle Classifier
Vapnik-Chervonenk (VC) Dimension
Rectangle can shatter 4 points
Illustration - Vapnik-Chervonenkis dimension.
Probably Approximately Correct Learning (PAC)
In computational learning theory, probably approximately
correct learning (PAC learning) is a framework for
mathematical analysis of machine learning algorithms
It was proposed in 1984 by Leslie Valiant
The goal is to ensure that a learning algorithm will probably
(with high probability) find an approximately correct
hypothesis based on a limited amount of training data
PAC-learnability
To fully define the Probably Approximately Correct (PAC) learning framework,
several parameters are crucial
These parameters provide the necessary constraints and criteria for the learning
algorithm to ensure its performance guarantees.
The key parameters include:
1) Hypothesis class (H): The hypothesis class defines the set of possible hypotheses
that the learning algorithm can select as the output. It represents the space of
functions from which the learning algorithm can choose the best approximation to
the target function.
2.Sample complexity (m): The sample complexity refers to the minimum number of training
examples required for the learning algorithm to find an approximately correct hypothesis.
3.Error tolerance (ϵ): The error tolerance parameter specifies the acceptable level of error in
the output hypothesis. It quantifies how closely the learned hypothesis needs to approximate
the true target function.
4.Confidence level (δ): The confidence level denotes the probability that the learning
algorithm's performance guarantees hold. It indicates the probability of the algorithm failing
to find an approximately correct hypothesis, and it is typically set to a small value, often
denoted as δ.
PAC-learnability
Terminologies and notations required to define PAC-learnability
Let X be a set called the instance space which may be finite or infinite. For example, X may be
the set of all points in a plane.
A concept class C for X is a family of functions c ∶ X → {0, 1}. A member of C is called a
concept. A concept can also be thought of as a subset of X. If C is a subset of X, it defines a
unique function µC ∶ X → {0, 1} as follows:
A hypothesis h is also a function h ∶ X → {0, 1}. So, as in the case of concepts, a hypothesis can
also be thought of as a subset of X. H will denote a set of hypotheses.
We assume that F is an arbitrary, but fixed, probability distribution over X
Training examples are obtained by taking random samples from X. We assume that the samples
are randomly generated from X according to the probability distribution F.
Definition
Let X be an instance space, C a concept class for X, h a hypothesis in C and F an
arbitrary, but fixed, probability distribution. The concept class C is said to be
PAC-learnable if there is an algorithm A which, for samples drawn with any
probability distribution F and any concept c ∈ C, will with high probability
produce a hypothesis h ∈ C whose error is small.
PAC Learning
False Negative And False Positive
Error region
Approximately Correct
Probability Approximately Correct
PAC Learning for Axis Aligned Rectangle
Approximately Correct
Error Region
Example problem 1
Example problem 2
Model Selection and Generalization
What is a Model in Machine
Learning?
Construct a Model
Construct a Model….
In regression, assuming a linear function is an inductive bias.
Among all lines, choosing the one that minimizes squared error is
another inductive bias
Each hypothesis class has a certain capacity and can learn only
certain functions
Underfitting
Overfitting
Model Selection
Mathematical or logical representation of a solution space
Inorder to formulate a hypothesis for a problem, we have to
choose some model
May also indicates the process of choosing one particular approach
from among several different approaches.
possible algorithms
possible sets of features
Initial values for certain parameters.
Model Selection
Inductive bias
The set of assumptions we make to have learning possible is called the
inductive bias of the learning algorithm
One way we introduce inductive bias is when we assume a hypothesis
class.
Examples
• In learning the class of family car, Assuming the shape of a rectangle is
an inductive bias.(Used to make the model simple)
• In regression, assuming a linear function is an inductive bias
Advantages of a simple model
Easy to use
Easy to train ( fewer parameters
Easy to explain
Easy to arrive at generalization
A simple model would generalize better than a complex model. This principle is
known as Occam’s razor, which states that simpler explanations are more
plausible and any unnecessary complexity should be shaved off
Generalisation
How well a model trained on the training set predicts the right
output for new instances is called generalization.
The model should be selected having the best generalization
Main causes for poor performance of learning algorithms
Underfitting
Overfitting
Testing generalisation: Cross-
validation
Generalization can be tested if we have data outside the training set
Simulation
Divide the dataset into two parts -training set and validation set, or
testing data
The hypothesis that is the most accurate on the validation set is the
best one (the one that has the best inductive bias). This process is
called cross-validation
Underfitting And Overfitting
The Triple Trade-off
Training Set And Validation Set
Test Set