0% found this document useful (0 votes)

28 views30 pages

Pac Learning

The document discusses the PAC (Probably Approximately Correct) learning framework, which involves a learner using labeled samples to select a hypothesis with a small generalization error regarding a target concept. It explains concepts such as generalization error, empirical error, and the conditions for a concept class to be PAC-learnable, including the roles of accuracy and confidence. Additionally, it covers examples of learning specific concepts, the VC (Vapnik-Chervonenkis) dimension, and the implications of hypothesis space complexity on learning outcomes.

Uploaded by

ibk2007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views30 pages

Pac Learning

Uploaded by

ibk2007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

PAC LEARNING

PAC learning framework

• The learner receives a sample S = (x1,…, xm) which are
are independently and identically distributed (i.i.d.)
according to some fixed but unknown distribution D, as

specific target concept c 𝝐 C to learn.

well as the labels (c(x1), …,c(xm)), which are based on a

hypothesis h 𝝐 H that has a small generalization error

• The task is then to use the labeled sample S to select a

with respect to the concept c.

Generalization error
• Given a hypothesis h 𝝐 H, a target concept c 𝝐 C, and an
underlying distribution D, the generalization error or risk of
h is defined by

• The generalization error of a hypothesis is not directly

accessible to the learner since both the distribution D and
the target concept c are unknown.
Empirical error
• Given a hypothesis h 𝝐 H, a target concept c 𝝐 C, and a
sample S = (x1,…, xm), the empirical error or empirical
risk of h is defined by

• Thus, the empirical error of h 𝝐 H is its average error over

the sample S, while the generalization error is its
expected error based on the distribution D.
PAC learning
• PAC stands for Probably Approximately Correct

any element x 𝝐 X is at most O(n) and denote by size(c) the maximal cost
where n is a number such that the computational cost of representing

of the computational representation of c 𝝐 C.

PAC learning
• A concept class C is thus PAC-learnable if the hypothesis

polynomial in 1/ 𝝐 and 1/𝝳 is approximately correct (error at

returned by the algorithm after observing a number of points

most 𝝐) with high probability (at least 1- 𝝳).

• 𝝐 is the upper bound on the error in accuracy, i.e.,

the hypothesis with error less than 𝝐
• Therefore, accuracy is 1- 𝝐

• 𝝳 gives the probability of failure in achieving the

accuracy 1- 𝝐. i.e., the hypothesis generated is
approximately correct at least 1- 𝝳
• Therefore, confidence is 1- 𝝳
PAC example
• Learning axis-aligned rectangle

• R represents a target axis-aligned rectangle and R’ a

hypothesis.
• Error regions
• the error regions of R are formed by the area within the rectangle R
0

but outside the rectangle R’ – false negative

• the area within R’, but outside the rectangle R – false positive
PAC example
• Learning tightest axis-aligned rectangle

• Given a labeled sample S, the algorithm consists of returning

the tightest axis-aligned rectangle R’ = RS containing the points
labeled with 1.
• RS does not produce any false positives, since its points must
be included in the target concept R. Thus, the error region of RS
is included in R - > R- RS
Error Region
• Error region = sum of four rectangular strips < 𝝐
• Each strip is at most 𝝐/4
• Probability of positive sample falling in any
one of the strip –error region= 𝝐/4
• Probability that a randomly drawn positive
sample misses a strip = 1- 𝝐/4
• P(m instances miss a strip)=(1- 𝝐/4)m
• P(m instances miss any strip)=4(1- 𝝐/4)m

• 4(1- 𝝐/4)m =4exp(-m 𝝐/4)

• Using the general inequality 1 - x <= e-x
Error Region

This yields that with probability at least 1 - 𝝳 , the error of the

algorithm is bounded as follows:
Learning bound — finite H, consistent case
proof
• Assume that the H contains some k bad hypotheses
• Hbad ={h1, h2, …, hk} with R(hS)>= 𝝐

• Let us consider a hypothesis hi

• Prob. that hi is consistent with first training example is <=1- 𝝐
• Prob. that hi is consistent with first m training example is
<=(1- 𝝐)m
• Prob. that at least one hi is consistent with first m training
example is <=k (1- 𝝐)m <=|H|(1- 𝝐)m
• Calculate the value of m so that |H|(1- 𝝐)m<= 𝝳
• Using the general inequality 1 - x <= e-x |H|e- m𝝐 <= 𝝳

• Equating the above equation to 𝝳 and solving for m, we get

Example: Conjunction of Boolean literals
• Consider learning the concept class Cn of conjunctions of
at most n Boolean literals x1, …, xn.
• A Boolean literal is either a variable xi, i 𝝐 [n], or its
negation

• For n = 4, an example is the conjunction:

• (1,0,0,1) is a positive example for this concept while
(1,0,0,0) is a negative example.
• since each literal can be included positively, with negation,
or not included, we have
Example: Conjunction of Boolean literals
• For n=6, the figure shows an example training sample

• And the consistent hypothesis

Example: Conjunction of Boolean literals
• Plugging this into the sample complexity bound for

complexity bound for any 𝝐 > 0 and 𝝳 > 0:

consistent hypotheses yields the following sample

• For 𝝳=0.02, 𝝐=0.1 and n=10, the bound becomes m

>= 149.
• Thus, for a labeled sample of at least 149 examples, the
bound guarantees 90% accuracy with a confidence of at
least 98%.
• Here, the number of training samples required is
exponential in n, which is the cost of the representation of
a point in X. Thus, PAC-learning is not guaranteed by the
theorem.
Learning bound — finite H, inconsistent case
Learning bound — finite H, inconsistent case

|H|exp(-2m𝝐2)
• So, any one hypothesis in H, that could have large error is

The bound for the no. of samples 1/2𝝐2[ln|H|

+ln(1/ 𝝳)]
Stochastic scenario

• This more general scenario is referred to as the stochastic

scenario.
• captures many real-world problems where the label of an input point is
not unique.
• For example, if we seek to predict gender based on input pairs formed
by the height and weight of a person, then the label will typically not be
unique.
Deterministic Scenario
• consider a distribution D over the input space. The
training sample is obtained by drawing x1, …, xn according

i 𝝐[m].
to D and the labels are obtained via f: yi = f(xi) for all

• When the label of a point can be uniquely determined by

some measurable function
• f : X -> Y (with probability one)
VC Dimension
• VC dimension - Vapnik-Chervonenkis dimension
• Provide a measure in the case where the hypothesis
space is infinite

• The VC dimension measures the complexity of the

hypothesis space H, not by the number of distinct
hypotheses |H|, but instead by the number of distinct
instances from X that can be completely discriminated
using H.
Shattering

• Let a subset of instances be S ⊆X and let N=2, then the

• Consider a hypothesis for the 2-class problem

possible labeling are

S ⊆X; that is, h partitions S into the two subsets

• Each hypothesis h from H imposes some dichotomy on

• {x ∊ Slh(x) = 1) and
• {x ∊ Slh(x) = 0).
Shattering
• A set of N instances can be labeled as + or – in 2N ways
• We say that H shatters S if every possible dichotomy of S
can be represented by some hypothesis from H.

• Definition: A set of instances S is shattered by hypothesis

space H if and only if for every dichotomy.of S there exists
some hypothesis in H consistent with this dichotomy.
• Consider 2 instances described using a single real valued
feature being shattered by a single interval

• But 3 instances can not be shattered by a single interval

VC dimension
• Definition: The Vapnik-Chervonenkis dimension, VC(H),
of hypothesis space H defined over instance space X is
the size of the largest finite subset of X shattered by H.

• If arbitrarily large finite sets of X can be shattered by H,

then VC(H) =∞.
• For a single interval on the real line, all set of 2 instances
can be shattered, whereas not the set of 3 instances.
Hence, VC(H) =2.

• VC dimension indicates that if we find any set of instances

of size d that can be shattered, then VC(H) >=d.
VC dimension
• An unbiased hypothesis space shatters the entire the instance
space

• What if H cannot shatter X, but can shatter some large subset

S of X? Intuitively, it seems reasonable to say that the larger
the subset of X that can be shattered, the more expressive H.

• The VC dimension of the set of oriented lines in 2-d is 3.

• Since there are 2m partitions of m instances, in order for H to

shatter instances: |H|>=2m

• VC(H) <=log(|H|)
Illustrative Example
• suppose the instance space X is the set of real numbers X
= R (e.g., describing the height of people), and H the set of
intervals on the real number line.
• In other words, H is the set of hypotheses of the form a < x
< b, where a and b may be any real constants. What is
VC(H)?
• S = {3.1,5.7}. Can S be shattered by H? Yes.
• For example, the four hypotheses (1 < x < 2), (1 < x < 4), (4 < x < 7),
and (1 < x < 7) will do.
• They represent each of the four dichotomies over S, covering neither
instance, either one of the instances, and both of the instances,
respectively.
• Since we have found a set of size two that can be shattered
by H, we know the VC dimension of H is at least two.

PAC Learning and Complexity
No ratings yet
PAC Learning and Complexity
14 pages
Machine Learning Theory for Students
No ratings yet
Machine Learning Theory for Students
45 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
10-601 Machine Learning
No ratings yet
10-601 Machine Learning
7 pages
SML Lecture3
No ratings yet
SML Lecture3
36 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
MLSM Lecture3 190923
No ratings yet
MLSM Lecture3 190923
36 pages
Machine Learning Theory Lecture
No ratings yet
Machine Learning Theory Lecture
6 pages
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
No ratings yet
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
38 pages
How Many Samples To Learn A Finite Class?
No ratings yet
How Many Samples To Learn A Finite Class?
4 pages
Lec 6
No ratings yet
Lec 6
29 pages
07 Agnostic Pac
No ratings yet
07 Agnostic Pac
5 pages
PAC Learning & Machine Learning Course
No ratings yet
PAC Learning & Machine Learning Course
36 pages
PAC Learning for ML Theorists
No ratings yet
PAC Learning for ML Theorists
34 pages
05 VC Bound
No ratings yet
05 VC Bound
27 pages
VC Dimension & Model Complexity
No ratings yet
VC Dimension & Model Complexity
42 pages
VC Dim
No ratings yet
VC Dim
22 pages
Module 1 Part3
No ratings yet
Module 1 Part3
56 pages
Machine Leaning 3
No ratings yet
Machine Leaning 3
44 pages
ML Notes
No ratings yet
ML Notes
161 pages
04 Growth Function
No ratings yet
04 Growth Function
32 pages
PAC Learning Explained
No ratings yet
PAC Learning Explained
15 pages
hw2 Sol
No ratings yet
hw2 Sol
3 pages
Machine Learning - Computational Learning Theory PDF
No ratings yet
Machine Learning - Computational Learning Theory PDF
7 pages
Lecture 01
No ratings yet
Lecture 01
11 pages
Lecture23 s12
No ratings yet
Lecture23 s12
14 pages
Vapnik-Chervonenkis Dimension
No ratings yet
Vapnik-Chervonenkis Dimension
6 pages
ML Unit-1
No ratings yet
ML Unit-1
42 pages
Lecture 1: Brief Overview - PAC Learning
No ratings yet
Lecture 1: Brief Overview - PAC Learning
3 pages
Computational Learning Theory Guide
No ratings yet
Computational Learning Theory Guide
24 pages
Al3451 - Machine Learning - Answer Key 13 Mark
No ratings yet
Al3451 - Machine Learning - Answer Key 13 Mark
22 pages
SML Lecture2
No ratings yet
SML Lecture2
35 pages
PAC Learning Explained
No ratings yet
PAC Learning Explained
124 pages
ML Lecture 8
No ratings yet
ML Lecture 8
12 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Active Learning in Machine Learning
No ratings yet
Active Learning in Machine Learning
6 pages
Learning Theory
No ratings yet
Learning Theory
19 pages
Tutorial
No ratings yet
Tutorial
81 pages
Lecture5 Learning Theory v1.1
No ratings yet
Lecture5 Learning Theory v1.1
59 pages
Unit 3
No ratings yet
Unit 3
99 pages
Computational Learning Theory Guide
No ratings yet
Computational Learning Theory Guide
43 pages
Hw5 Solution
No ratings yet
Hw5 Solution
4 pages
Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
8 pages
ML Lecture23
No ratings yet
ML Lecture23
57 pages
Week 3
No ratings yet
Week 3
56 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
PAC Learning Frameworks Explained
No ratings yet
PAC Learning Frameworks Explained
59 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
14 pages
Lec-3-Vc Dimension and Pac Learning
No ratings yet
Lec-3-Vc Dimension and Pac Learning
19 pages
Lecture 09 Bounds For Bad Hypotheses
No ratings yet
Lecture 09 Bounds For Bad Hypotheses
26 pages
계산학습이론
No ratings yet
계산학습이론
1 page
Lecture22 s12
No ratings yet
Lecture22 s12
21 pages
Lecture 13
No ratings yet
Lecture 13
45 pages
Lect 3
No ratings yet
Lect 3
4 pages
1 PN JN Diode Zener Diode 5-5-25 Notes
No ratings yet
1 PN JN Diode Zener Diode 5-5-25 Notes
25 pages
Risk Minimization
No ratings yet
Risk Minimization
12 pages
Software - BAEEE101 - Sinusoidal Response of RLC Circuit
No ratings yet
Software - BAEEE101 - Sinusoidal Response of RLC Circuit
5 pages
BAEEE101 Basic Engineering Syllabus - CAT1
No ratings yet
BAEEE101 Basic Engineering Syllabus - CAT1
4 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Online Learning
No ratings yet
Online Learning
5 pages
Clustering
No ratings yet
Clustering
84 pages
A Novel Bi-Directional DC-DC Converter For Distributed Energy Storage Device
No ratings yet
A Novel Bi-Directional DC-DC Converter For Distributed Energy Storage Device
5 pages
When The Supply Is Restored Automatically, The Resulting Event Is Called Short Interruption
No ratings yet
When The Supply Is Restored Automatically, The Resulting Event Is Called Short Interruption
17 pages
2024 Review On Smart Grid Load Forecasting For Smart Energy Management Using
No ratings yet
2024 Review On Smart Grid Load Forecasting For Smart Energy Management Using
17 pages
Bee305p SW1
No ratings yet
Bee305p SW1
4 pages
Introduction To Machine Learning and Hands On Sessions
No ratings yet
Introduction To Machine Learning and Hands On Sessions
50 pages
Tree Models & Generalization in Python
No ratings yet
Tree Models & Generalization in Python
37 pages
ML Complete Notes-AIDS
No ratings yet
ML Complete Notes-AIDS
115 pages
Boosting Margin
No ratings yet
Boosting Margin
30 pages
Lecture 2
No ratings yet
Lecture 2
98 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
9 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
103 pages
FDS Viva
No ratings yet
FDS Viva
46 pages
Deep Learning & ML Basics Guide
No ratings yet
Deep Learning & ML Basics Guide
151 pages
A Practical Theory of Generalization in Selectivity Learning
No ratings yet
A Practical Theory of Generalization in Selectivity Learning
14 pages
Does Learning Require Memorization? A Short Tale About A Long Tail
No ratings yet
Does Learning Require Memorization? A Short Tale About A Long Tail
6 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
9 pages
Lecture-4 Emprical Risk and Optimization
No ratings yet
Lecture-4 Emprical Risk and Optimization
20 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
6 pages
Pac Learning
No ratings yet
Pac Learning
30 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Machine Learning Unit 1 Notes
No ratings yet
Machine Learning Unit 1 Notes
22 pages
CH11
No ratings yet
CH11
36 pages
Deep Reinforcement Learning Yuxi Li Itebooks Download
No ratings yet
Deep Reinforcement Learning Yuxi Li Itebooks Download
53 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Data Mining and Decision Trees Quiz
50% (6)
Data Mining and Decision Trees Quiz
3 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Optimization For Deep Learning Theory and Algorithms
No ratings yet
Optimization For Deep Learning Theory and Algorithms
60 pages
Network Traffic Classification With Improved Random Forest
No ratings yet
Network Traffic Classification With Improved Random Forest
4 pages
Kernel Classifiers for Researchers
No ratings yet
Kernel Classifiers for Researchers
382 pages
Generalization Error: Elie Kawerk
No ratings yet
Generalization Error: Elie Kawerk
37 pages
Machine Learning: Huawei AI Academy Training Materials
No ratings yet
Machine Learning: Huawei AI Academy Training Materials
46 pages
Molina-Garip 2019 Socarxiv
No ratings yet
Molina-Garip 2019 Socarxiv
27 pages
Random Forest
No ratings yet
Random Forest
225 pages

Pac Learning

Uploaded by

Pac Learning

Uploaded by

PAC LEARNING

PAC learning framework

specific target concept c 𝝐 C to learn.

hypothesis h 𝝐 H that has a small generalization error

with respect to the concept c.

• The generalization error of a hypothesis is not directly

• Thus, the empirical error of h 𝝐 H is its average error over

of the computational representation of c 𝝐 C.

polynomial in 1/ 𝝐 and 1/𝝳 is approximately correct (error at

most 𝝐) with high probability (at least 1- 𝝳).

• 𝝐 is the upper bound on the error in accuracy, i.e.,

• 𝝳 gives the probability of failure in achieving the

• R represents a target axis-aligned rectangle and R’ a

but outside the rectangle R’ – false negative

• Given a labeled sample S, the algorithm consists of returning

• 4(1- 𝝐/4)m =4exp(-m 𝝐/4)

This yields that with probability at least 1 - 𝝳 , the error of the

• Let us consider a hypothesis hi

• Equating the above equation to 𝝳 and solving for m, we get

• For n = 4, an example is the conjunction:

• And the consistent hypothesis

complexity bound for any 𝝐 > 0 and 𝝳 > 0:

• For 𝝳=0.02, 𝝐=0.1 and n=10, the bound becomes m

The bound for the no. of samples 1/2𝝐2[ln|H|

• This more general scenario is referred to as the stochastic

• When the label of a point can be uniquely determined by

• The VC dimension measures the complexity of the

• Let a subset of instances be S ⊆X and let N=2, then the

possible labeling are

S ⊆X; that is, h partitions S into the two subsets

• Definition: A set of instances S is shattered by hypothesis

• But 3 instances can not be shattered by a single interval

• If arbitrarily large finite sets of X can be shattered by H,

• VC dimension indicates that if we find any set of instances

• What if H cannot shatter X, but can shatter some large subset

• The VC dimension of the set of oriented lines in 2-d is 3.

• Since there are 2m partitions of m instances, in order for H to

You might also like