Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views30 pages

Pac Learning

The document discusses the PAC (Probably Approximately Correct) learning framework, which involves a learner using labeled samples to select a hypothesis with a small generalization error regarding a target concept. It explains concepts such as generalization error, empirical error, and the conditions for a concept class to be PAC-learnable, including the roles of accuracy and confidence. Additionally, it covers examples of learning specific concepts, the VC (Vapnik-Chervonenkis) dimension, and the implications of hypothesis space complexity on learning outcomes.

Uploaded by

ibk2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views30 pages

Pac Learning

The document discusses the PAC (Probably Approximately Correct) learning framework, which involves a learner using labeled samples to select a hypothesis with a small generalization error regarding a target concept. It explains concepts such as generalization error, empirical error, and the conditions for a concept class to be PAC-learnable, including the roles of accuracy and confidence. Additionally, it covers examples of learning specific concepts, the VC (Vapnik-Chervonenkis) dimension, and the implications of hypothesis space complexity on learning outcomes.

Uploaded by

ibk2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

PAC LEARNING

PAC learning framework


• The learner receives a sample S = (x1,…, xm) which are
are independently and identically distributed (i.i.d.)
according to some fixed but unknown distribution D, as

specific target concept c 𝝐 C to learn.


well as the labels (c(x1), …,c(xm)), which are based on a

hypothesis h 𝝐 H that has a small generalization error


• The task is then to use the labeled sample S to select a

with respect to the concept c.


Generalization error
• Given a hypothesis h 𝝐 H, a target concept c 𝝐 C, and an
underlying distribution D, the generalization error or risk of
h is defined by

• The generalization error of a hypothesis is not directly


accessible to the learner since both the distribution D and
the target concept c are unknown.
Empirical error
• Given a hypothesis h 𝝐 H, a target concept c 𝝐 C, and a
sample S = (x1,…, xm), the empirical error or empirical
risk of h is defined by

• Thus, the empirical error of h 𝝐 H is its average error over


the sample S, while the generalization error is its
expected error based on the distribution D.
PAC learning
• PAC stands for Probably Approximately Correct

any element x 𝝐 X is at most O(n) and denote by size(c) the maximal cost
where n is a number such that the computational cost of representing

of the computational representation of c 𝝐 C.


PAC learning
• A concept class C is thus PAC-learnable if the hypothesis

polynomial in 1/ 𝝐 and 1/𝝳 is approximately correct (error at


returned by the algorithm after observing a number of points

most 𝝐) with high probability (at least 1- 𝝳).

• 𝝐 is the upper bound on the error in accuracy, i.e.,


the hypothesis with error less than 𝝐
• Therefore, accuracy is 1- 𝝐

• 𝝳 gives the probability of failure in achieving the


accuracy 1- 𝝐. i.e., the hypothesis generated is
approximately correct at least 1- 𝝳
• Therefore, confidence is 1- 𝝳
PAC example
• Learning axis-aligned rectangle

• R represents a target axis-aligned rectangle and R’ a


hypothesis.
• Error regions
• the error regions of R are formed by the area within the rectangle R
0

but outside the rectangle R’ – false negative


• the area within R’, but outside the rectangle R – false positive
PAC example
• Learning tightest axis-aligned rectangle

• Given a labeled sample S, the algorithm consists of returning


the tightest axis-aligned rectangle R’ = RS containing the points
labeled with 1.
• RS does not produce any false positives, since its points must
be included in the target concept R. Thus, the error region of RS
is included in R - > R- RS
Error Region
• Error region = sum of four rectangular strips < 𝝐
• Each strip is at most 𝝐/4
• Probability of positive sample falling in any
one of the strip –error region= 𝝐/4
• Probability that a randomly drawn positive
sample misses a strip = 1- 𝝐/4
• P(m instances miss a strip)=(1- 𝝐/4)m
• P(m instances miss any strip)=4(1- 𝝐/4)m

• 4(1- 𝝐/4)m =4exp(-m 𝝐/4)


• Using the general inequality 1 - x <= e-x
Error Region

This yields that with probability at least 1 - 𝝳 , the error of the


algorithm is bounded as follows:
Learning bound — finite H, consistent case
proof
• Assume that the H contains some k bad hypotheses
• Hbad ={h1, h2, …, hk} with R(hS)>= 𝝐

• Let us consider a hypothesis hi


• Prob. that hi is consistent with first training example is <=1- 𝝐
• Prob. that hi is consistent with first m training example is
<=(1- 𝝐)m
• Prob. that at least one hi is consistent with first m training
example is <=k (1- 𝝐)m <=|H|(1- 𝝐)m
• Calculate the value of m so that |H|(1- 𝝐)m<= 𝝳
• Using the general inequality 1 - x <= e-x |H|e- m𝝐 <= 𝝳

• Equating the above equation to 𝝳 and solving for m, we get


Example: Conjunction of Boolean literals
• Consider learning the concept class Cn of conjunctions of
at most n Boolean literals x1, …, xn.
• A Boolean literal is either a variable xi, i 𝝐 [n], or its
negation

• For n = 4, an example is the conjunction:


• (1,0,0,1) is a positive example for this concept while
(1,0,0,0) is a negative example.
• since each literal can be included positively, with negation,
or not included, we have
Example: Conjunction of Boolean literals
• For n=6, the figure shows an example training sample

• And the consistent hypothesis


Example: Conjunction of Boolean literals
• Plugging this into the sample complexity bound for

complexity bound for any 𝝐 > 0 and 𝝳 > 0:


consistent hypotheses yields the following sample

• For 𝝳=0.02, 𝝐=0.1 and n=10, the bound becomes m


>= 149.
• Thus, for a labeled sample of at least 149 examples, the
bound guarantees 90% accuracy with a confidence of at
least 98%.
• Here, the number of training samples required is
exponential in n, which is the cost of the representation of
a point in X. Thus, PAC-learning is not guaranteed by the
theorem.
Learning bound — finite H, inconsistent case
Learning bound — finite H, inconsistent case

|H|exp(-2m𝝐2)
• So, any one hypothesis in H, that could have large error is

The bound for the no. of samples 1/2𝝐2[ln|H|


+ln(1/ 𝝳)]
Stochastic scenario

• This more general scenario is referred to as the stochastic


scenario.
• captures many real-world problems where the label of an input point is
not unique.
• For example, if we seek to predict gender based on input pairs formed
by the height and weight of a person, then the label will typically not be
unique.
Deterministic Scenario
• consider a distribution D over the input space. The
training sample is obtained by drawing x1, …, xn according

i 𝝐[m].
to D and the labels are obtained via f: yi = f(xi) for all

• When the label of a point can be uniquely determined by


some measurable function
• f : X -> Y (with probability one)
VC Dimension
• VC dimension - Vapnik-Chervonenkis dimension
• Provide a measure in the case where the hypothesis
space is infinite

• The VC dimension measures the complexity of the


hypothesis space H, not by the number of distinct
hypotheses |H|, but instead by the number of distinct
instances from X that can be completely discriminated
using H.
Shattering

• Let a subset of instances be S ⊆X and let N=2, then the


• Consider a hypothesis for the 2-class problem

possible labeling are

S ⊆X; that is, h partitions S into the two subsets


• Each hypothesis h from H imposes some dichotomy on

• {x ∊ Slh(x) = 1) and
• {x ∊ Slh(x) = 0).
Shattering
• A set of N instances can be labeled as + or – in 2N ways
• We say that H shatters S if every possible dichotomy of S
can be represented by some hypothesis from H.

• Definition: A set of instances S is shattered by hypothesis


space H if and only if for every dichotomy.of S there exists
some hypothesis in H consistent with this dichotomy.
• Consider 2 instances described using a single real valued
feature being shattered by a single interval

• But 3 instances can not be shattered by a single interval


VC dimension
• Definition: The Vapnik-Chervonenkis dimension, VC(H),
of hypothesis space H defined over instance space X is
the size of the largest finite subset of X shattered by H.

• If arbitrarily large finite sets of X can be shattered by H,


then VC(H) =∞.
• For a single interval on the real line, all set of 2 instances
can be shattered, whereas not the set of 3 instances.
Hence, VC(H) =2.

• VC dimension indicates that if we find any set of instances


of size d that can be shattered, then VC(H) >=d.
VC dimension
• An unbiased hypothesis space shatters the entire the instance
space

• What if H cannot shatter X, but can shatter some large subset


S of X? Intuitively, it seems reasonable to say that the larger
the subset of X that can be shattered, the more expressive H.

• The VC dimension of the set of oriented lines in 2-d is 3.

• Since there are 2m partitions of m instances, in order for H to


shatter instances: |H|>=2m

• VC(H) <=log(|H|)
Illustrative Example
• suppose the instance space X is the set of real numbers X
= R (e.g., describing the height of people), and H the set of
intervals on the real number line.
• In other words, H is the set of hypotheses of the form a < x
< b, where a and b may be any real constants. What is
VC(H)?
• S = {3.1,5.7}. Can S be shattered by H? Yes.
• For example, the four hypotheses (1 < x < 2), (1 < x < 4), (4 < x < 7),
and (1 < x < 7) will do.
• They represent each of the four dichotomies over S, covering neither
instance, either one of the instances, and both of the instances,
respectively.
• Since we have found a set of size two that can be shattered
by H, we know the VC dimension of H is at least two.

You might also like