0% found this document useful (0 votes)

17 views21 pages

Cours2 ML

Uploaded by

laribiamal24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views21 pages

Cours2 ML

Uploaded by

laribiamal24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Plug-in Methods & under/over-fitting

Vianney Perchet
February 5th 2024

Lecture 2/12
Last Lecture Take Home Message

• “Attributes/Features” space X ⊂ Rd & ”label” space Y ⊂ R

n o
• Training data-set: Dn = (X1 , Y1 ), . . . , (Xn , Yn )
• Risk w.r.t. loss ℓ : Y × Y → R+
h i
Risk: R(f) = E(X,Y)∼P ℓ f(X), Y

• Optimal risk and Bayes predictor

f∗ = arg minf R(f) and R∗ = R(f∗ )
• Binary Classification
• 0/1-loss: ℓ(y, y′ ) = 1{y ̸= y′ }
• Bayes classifier. f∗ (x) = 1{η(x) ≥ 1
2
}
• Linear Regression
• quad-loss: ℓ(y, y′ ) = ∥y − y′ ∥2
• Bayes regressor. f∗ (x) = η(x)

2
Focus on Binary Classification

f∗ (x) = 1{η(x) ≥ 1
2}

• Simple strategy
• Estimate η(x) by η̂(x) (using Dn )
• Plug-it in the formula, i.e., f̂(x) = 1{η̂(x) ≥ 1
2
}
• Pray for the best

3
Focus on Binary Classification

f∗ (x) = 1{η(x) ≥ 1
2}

3
Focus on Binary Classification

f∗ (x) = 1{η(x) ≥ 1
2}

• Simple strategy
• Estimate η(x) by η̂(x) (using Dn )
• Plug-it in the formula, i.e., f̂(x) = 1{η̂(x) ≥ 1
2
}
• Pray for the best
• ✓ It works !
• Simple methods, already implemented, intuitive and interpretable
• Many variants (K-NN, regressograms, Kernels)
7 if choose the correct parameters !
• 7 Convergence can be slow
• Even arbitrarily slow “No Free-Lunch Theorem”
• E R(f̂) − R∗ ≥ log log1log(n) which is “constant”

3
Focus on Binary Classification

f∗ (x) = 1{η(x) ≥ 1
2}

• ✓ But these are pathological counter-examples !

• In practice, data are “regular” (Lipschitz, Holder, ....)
• Can compute explicit rates
3
Regressograms. The model

• Partition of X = Rd into “bins” (hypercubes) of size hn

• Volume of one bin: hdn
• Independently on each bin B
♯{i:Xi ∈B and Yi =1}
• η̂(x) = ♯{i:Xi ∈B}
[piece-wise constant]

Th. If hn → 0 and nhdn → ∞, “consistency” i.e., R(fn ) → R∗ in proba

• Proof ideas: h i
1. Lemma: R(f) − R∗ ≤ 2E |η̂(X) − η(X)| Dn
2. Approximation error: hn → 0
3. Estimation error: nhdn → ∞

4
Regressograms. Pros/cons

✓ Pros
• Simple, intuitive & interpretable
• Computational complexity
7 Cons
• Find the correct value of h
• Partition not data-dependent (why bins ?)
• Lots of empty bins
• Space complexity: ♯ bins huge

5
Regressograms. Pros/cons

5
K-Nearest Neighbors. The model

• Adaptive partition of X = Rd
• 1 parameter kn
• Neighborhood Nkn (x) = {kn closest Xj to x}
♯{i:Xi ∈Nkn x) and Yi =1}
• η̂(x) = ♯{i:Xi ∈Nkn (x)}

• Piece-wise (polytopial) constant

kn
Th. If n → 0 and kn → ∞, “consistency”

• Proof ideas:
1. Approximation error: knn → 0
2. Estimation error: kn → ∞

6
K-Nearest Neighbors. Pros/cons

✓ Pros
• Intuitive & (somehow) interpretable
• Data Dependent partition
• No empty bins (& no arbitrary choice)
• Space complexity
7 Cons
• Find the correct value of k
• Weirdly shaped partition
• Computational complexity: finding the partition

7
K-Nearest Neighbors. Pros/cons

7
Kernel-Methods (Nadaraya-Watson)

• Adaptive partition of X = Rd
• 2 parameters Kernel Kn (·) : X → R+ and window h ∈ R+
P
Kn ( x−X i
)Yi
• η̂(x) = Pi h
x−Xj
j K n ( h
)

Th. If hn → 0 and nhn → ∞, “consistency”

• Proof ideas:
1. Approximation error: hn → 0
2. Estimation error: nhn → ∞

8
Typical Kernel

• Usual properties
R
• Normalized X K(u)du = 1
• Symmetry K(−u) = K(u)
R R
• Bounded variance X ∥u∥2 K(u)du < ∞& X K2 (u)du < ∞
• Typical Kernels
• uniform K(x) = 12 1{x ∈ [−1; 1]}

• triangular K(x) = (1 − |x|)1{x ∈ [−1; 1]}

• Gaussian K(x) = √1
2π
exp(− 12 x2 )
2 1
• Sigmoid K(x) = x ex +e−x

9
Kernels. Pros/cons

✓ Pros
• Intuitive & (somehow) interpretable
• Use all/many points to estimate
• Data Dependent
• No empty bins (& no arbitrary choice)
• Smooth/regular approximation
7 Cons
• Find the correct kernels K(·) and window h

10
Kernels. Pros/cons

10
Over/Under-fitting

All learning algorithms have data-fitting parameter(s)

• Choose it too small = under-fitting

Pn
7 Big empirical error on training set 1
ℓ f(Xi ), Yi )
n h
i=1
i
7 Medium (generalization) error E ℓ f(X), Y
• Choose it too big = over-fitting
Pn
✓ Small empirical error (even 0) 1
f(Xi ), Yi )
ℓ
n h
i=1
i
7 Huge (generalization) error E ℓ f(X), Y
• How to choose it ??
√
• Do not focus too much on empirical error (around 1/ n ?)
• Find several candidates & pick the smallest one (Occam’s razor)
• Cross validate (following lecture !)

11
Over/Under-fitting

11
Take home message - Local/Plug-in Methods

h i
Lemma: R(f) − R∗ ≤ 2E |η̂(X) − η(X)| Dn

• Estimate η(·) by η̂(·).

• Plug-it in the formula f∗ (x) = 1{η(x) ≥ 1
2
}
• Local Methods
P
• General form η̂(x) = ni=1 ω x, Xi ; (X1 , X2 , . . . , Xn ) Yi
convex weights for all x (in [0, 1] and sum to 1)
• Typical examples
• Regressogram
• k-Nearest neighbors
• Kernel methods
• Avoid Under/Over fitting
• Many points around x with positive weight over-fitting
• Points far from x with small/zero weight under-fitting

AI and ML Lab - VIVA Questions
100% (5)
AI and ML Lab - VIVA Questions
7 pages
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
No ratings yet
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
644 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
(Bernhard Schölkopf, Alexander J. Smola) Learning With Kernels PDF
No ratings yet
(Bernhard Schölkopf, Alexander J. Smola) Learning With Kernels PDF
645 pages
Statistical Learning Theory Notes
No ratings yet
Statistical Learning Theory Notes
119 pages
R PPT 30
No ratings yet
R PPT 30
45 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Learning Kernel Classifiers. Theory and Algorithms
100% (3)
Learning Kernel Classifiers. Theory and Algorithms
371 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
Machine Learning Overview & SVMs
No ratings yet
Machine Learning Overview & SVMs
378 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Stats 2
No ratings yet
Stats 2
6 pages
Vahid
No ratings yet
Vahid
18 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
Advanced Online Learning Algorithms
No ratings yet
Advanced Online Learning Algorithms
125 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Machine Learning Foundations
No ratings yet
Machine Learning Foundations
119 pages
Compiled Syllabus
No ratings yet
Compiled Syllabus
146 pages
Sol All
No ratings yet
Sol All
66 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
Preface VII Mathematical Notation Xi Contents Xiii
No ratings yet
Preface VII Mathematical Notation Xi Contents Xiii
6 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Xiii Xiv Contents: 2 Probability Distributions 67
No ratings yet
Xiii Xiv Contents: 2 Probability Distributions 67
6 pages
r16 Syllabus Cse Jntuh
No ratings yet
r16 Syllabus Cse Jntuh
58 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
Supervised Learning Techniques
No ratings yet
Supervised Learning Techniques
33 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Smola PDF
No ratings yet
Smola PDF
271 pages
ML Record 2-Ba
No ratings yet
ML Record 2-Ba
75 pages
Kernel Classifiers for Researchers
No ratings yet
Kernel Classifiers for Researchers
382 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Data Analytics For Accounting Exercise Multiple Choice and Discussion Question
No ratings yet
Data Analytics For Accounting Exercise Multiple Choice and Discussion Question
3 pages
NFT PPT1 w21
No ratings yet
NFT PPT1 w21
352 pages
SAPexperts - An Introduction To SAP Predictive Analytics 2
No ratings yet
SAPexperts - An Introduction To SAP Predictive Analytics 2
59 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Lect 1
No ratings yet
Lect 1
24 pages
0975 Data Science and Machine Learning
No ratings yet
0975 Data Science and Machine Learning
6 pages
Professional Certification in Data Science For Business Decision Making
No ratings yet
Professional Certification in Data Science For Business Decision Making
16 pages
Apache Mahout
No ratings yet
Apache Mahout
22 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
5 2 Algoritam Ucenja Graficki BP
No ratings yet
5 2 Algoritam Ucenja Graficki BP
23 pages
PML Book
No ratings yet
PML Book
341 pages
i2ML Cheatsheets
No ratings yet
i2ML Cheatsheets
7 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Smart Aid for Visually Impaired
No ratings yet
Smart Aid for Visually Impaired
14 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
61 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Machine Learning - The Science of Selection Under Uncertainty
No ratings yet
Machine Learning - The Science of Selection Under Uncertainty
85 pages
Batlle Et Al. - 2023 - Kernel Methods Are Competitive For Operator Learning
No ratings yet
Batlle Et Al. - 2023 - Kernel Methods Are Competitive For Operator Learning
36 pages
Classroom Project Report Latex Template
No ratings yet
Classroom Project Report Latex Template
7 pages
Chap 8
No ratings yet
Chap 8
9 pages
Machine Learning
No ratings yet
Machine Learning
662 pages
Detection and Impact of Land Encroachment in El-Beheira
No ratings yet
Detection and Impact of Land Encroachment in El-Beheira
10 pages
Exam2021-2022 (Jan C)
No ratings yet
Exam2021-2022 (Jan C)
3 pages
Shazia12,+journal+manager,+24.+1574+sheduled+ Compressed
No ratings yet
Shazia12,+journal+manager,+24.+1574+sheduled+ Compressed
14 pages
Detecting Phishing with AI Techniques
No ratings yet
Detecting Phishing with AI Techniques
10 pages
Transportation Mode Choice Analysis
No ratings yet
Transportation Mode Choice Analysis
6 pages
Lect36 Tasks
No ratings yet
Lect36 Tasks
95 pages
Pattern Recognition Handwritten Notes
No ratings yet
Pattern Recognition Handwritten Notes
64 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
1722496821005-M Tech
No ratings yet
1722496821005-M Tech
32 pages
Perceptron Learning and Classification
No ratings yet
Perceptron Learning and Classification
12 pages
Shopping With A Robotic Companion
No ratings yet
Shopping With A Robotic Companion
15 pages
Opre 2020 2069
No ratings yet
Opre 2020 2069
19 pages
Unit 2 - Class - Preceptron
No ratings yet
Unit 2 - Class - Preceptron
13 pages
Data Science Student Projects
No ratings yet
Data Science Student Projects
1 page
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Final Report
No ratings yet
Final Report
42 pages
Yirun Fu Reaserch Paper
No ratings yet
Yirun Fu Reaserch Paper
42 pages
Stats 205 Notes
No ratings yet
Stats 205 Notes
99 pages
Intro To Data Science
No ratings yet
Intro To Data Science
47 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
Cours KNN
No ratings yet
Cours KNN
10 pages
Kernel Methods
No ratings yet
Kernel Methods
32 pages
Statistical Learning Lecture Notes Paris Saclay 2019
No ratings yet
Statistical Learning Lecture Notes Paris Saclay 2019
143 pages
Intro To Machine Learning Lecture Notes3
No ratings yet
Intro To Machine Learning Lecture Notes3
7 pages

Cours2 ML

Uploaded by

Cours2 ML

Uploaded by

Plug-in Methods & under/over-fitting

• “Attributes/Features” space X ⊂ Rd & ”label” space Y ⊂ R

• Optimal risk and Bayes predictor

• ✓ But these are pathological counter-examples !

• Partition of X = Rd into “bins” (hypercubes) of size hn

Th. If hn → 0 and nhdn → ∞, “consistency” i.e., R(fn ) → R∗ in proba

• Piece-wise (polytopial) constant

Th. If hn → 0 and nhn → ∞, “consistency”

• triangular K(x) = (1 − |x|)1{x ∈ [−1; 1]}

All learning algorithms have data-fitting parameter(s)

• Choose it too small = under-fitting

• Estimate η(·) by η̂(·).

You might also like