Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views56 pages

Module 1 Part3

The document discusses key concepts in machine learning, including the Vapnik-Chervonenkis (VC) dimension, which measures the complexity of hypothesis spaces, and Probably Approximately Correct (PAC) learning, a framework for analyzing learning algorithms. It emphasizes the importance of model selection, generalization, and inductive bias in creating effective machine learning models, highlighting the trade-offs between underfitting and overfitting. Additionally, it introduces parameters critical for PAC learning, such as hypothesis class, sample complexity, error tolerance, and confidence level.

Uploaded by

ananthan.2005.10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views56 pages

Module 1 Part3

The document discusses key concepts in machine learning, including the Vapnik-Chervonenkis (VC) dimension, which measures the complexity of hypothesis spaces, and Probably Approximately Correct (PAC) learning, a framework for analyzing learning algorithms. It emphasizes the importance of model selection, generalization, and inductive bias in creating effective machine learning models, highlighting the trade-offs between underfitting and overfitting. Additionally, it introduces parameters critical for PAC learning, such as hypothesis class, sample complexity, error tolerance, and confidence level.

Uploaded by

ananthan.2005.10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Module 1

i) VAPNIK-CHERVONENKIS (VC) DIMENSION,


ii) PROBABLY APPROXIMATELY CORRECT LEARNING (PAC)
III) MODEL SELECTION AND GENERALIZATION
Vapnik-Chervonenkis dimension(1971)

 Measures the complexity of the space H.


 Not count hk
 It counts the no of distinct instances of X that can be completely
discriminated by H (shattering)
VC DIMENSION

Dichotomy: 2-class problems


X: elements € class 0 or 1
H: any straight line in 2D
Linear Classifier with two data points
Linear Classifier with three data
points
Shattering
Linear Classifier with four data points
Rectangle Classifier
Rectangle Classifier
Vapnik-Chervonenk (VC) Dimension
Rectangle can shatter 4 points
Illustration - Vapnik-Chervonenkis dimension.
Probably Approximately Correct Learning (PAC)

 In computational learning theory, probably approximately


correct learning (PAC learning) is a framework for
mathematical analysis of machine learning algorithms
 It was proposed in 1984 by Leslie Valiant
 The goal is to ensure that a learning algorithm will probably
(with high probability) find an approximately correct
hypothesis based on a limited amount of training data
PAC-learnability

 To fully define the Probably Approximately Correct (PAC) learning framework,


several parameters are crucial
 These parameters provide the necessary constraints and criteria for the learning
algorithm to ensure its performance guarantees.
 The key parameters include:
1) Hypothesis class (H): The hypothesis class defines the set of possible hypotheses
that the learning algorithm can select as the output. It represents the space of
functions from which the learning algorithm can choose the best approximation to
the target function.
2.Sample complexity (m): The sample complexity refers to the minimum number of training
examples required for the learning algorithm to find an approximately correct hypothesis.
3.Error tolerance (ϵ): The error tolerance parameter specifies the acceptable level of error in
the output hypothesis. It quantifies how closely the learned hypothesis needs to approximate
the true target function.
4.Confidence level (δ): The confidence level denotes the probability that the learning
algorithm's performance guarantees hold. It indicates the probability of the algorithm failing
to find an approximately correct hypothesis, and it is typically set to a small value, often
denoted as δ.
PAC-learnability

Terminologies and notations required to define PAC-learnability


 Let X be a set called the instance space which may be finite or infinite. For example, X may be
the set of all points in a plane.
 A concept class C for X is a family of functions c ∶ X → {0, 1}. A member of C is called a
concept. A concept can also be thought of as a subset of X. If C is a subset of X, it defines a
unique function µC ∶ X → {0, 1} as follows:

 A hypothesis h is also a function h ∶ X → {0, 1}. So, as in the case of concepts, a hypothesis can
also be thought of as a subset of X. H will denote a set of hypotheses.
 We assume that F is an arbitrary, but fixed, probability distribution over X
 Training examples are obtained by taking random samples from X. We assume that the samples
are randomly generated from X according to the probability distribution F.
Definition

 Let X be an instance space, C a concept class for X, h a hypothesis in C and F an


arbitrary, but fixed, probability distribution. The concept class C is said to be
PAC-learnable if there is an algorithm A which, for samples drawn with any
probability distribution F and any concept c ∈ C, will with high probability
produce a hypothesis h ∈ C whose error is small.
PAC Learning
False Negative And False Positive
Error region
Approximately Correct
Probability Approximately Correct
PAC Learning for Axis Aligned Rectangle
Approximately Correct
Error Region
Example problem 1
Example problem 2
Model Selection and Generalization
What is a Model in Machine
Learning?
Construct a Model
Construct a Model….
 In regression, assuming a linear function is an inductive bias.
 Among all lines, choosing the one that minimizes squared error is
another inductive bias
 Each hypothesis class has a certain capacity and can learn only
certain functions
Underfitting
Overfitting
Model Selection

 Mathematical or logical representation of a solution space


 Inorder to formulate a hypothesis for a problem, we have to
choose some model
 May also indicates the process of choosing one particular approach
from among several different approaches.
 possible algorithms
 possible sets of features
 Initial values for certain parameters.
Model Selection
Inductive bias

 The set of assumptions we make to have learning possible is called the


inductive bias of the learning algorithm
 One way we introduce inductive bias is when we assume a hypothesis
class.
Examples
• In learning the class of family car, Assuming the shape of a rectangle is
an inductive bias.(Used to make the model simple)
• In regression, assuming a linear function is an inductive bias
Advantages of a simple model

 Easy to use
 Easy to train ( fewer parameters
 Easy to explain
 Easy to arrive at generalization
 A simple model would generalize better than a complex model. This principle is
known as Occam’s razor, which states that simpler explanations are more
plausible and any unnecessary complexity should be shaved off
Generalisation

 How well a model trained on the training set predicts the right
output for new instances is called generalization.
 The model should be selected having the best generalization
 Main causes for poor performance of learning algorithms
 Underfitting

 Overfitting
Testing generalisation: Cross-
validation

 Generalization can be tested if we have data outside the training set


Simulation
 Divide the dataset into two parts -training set and validation set, or
testing data
 The hypothesis that is the most accurate on the validation set is the
best one (the one that has the best inductive bias). This process is
called cross-validation
Underfitting And Overfitting
The Triple Trade-off
Training Set And Validation Set
Test Set

You might also like