0% found this document useful (0 votes)

75 views27 pages

Deep Learning Basics Lecture 4 Regularization II

Regularization techniques discussed include: 1. Adding constraints such as l1 and l2 norms to prevent overfitting. 2. Adding noise to inputs, weights, or outputs to improve robustness and act as regularization. This can be equivalent to weight decay. 3. Data augmentation such as rotations to artificially increase data size. 4. Early stopping training when validation error stops improving to prevent overfitting. 5. Dropout which randomly drops units during training, forcing the model to learn robust representations.

Uploaded by

baris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views27 pages

Deep Learning Basics Lecture 4 Regularization II

Uploaded by

baris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Deep Learning Basics

Lecture 4: Regularization II
Princeton University COS 495
Instructor: Yingyu Liang
Review
Regularization as hard constraint
• Constrained optimization
𝑛
1
min 𝐿෠ 𝜃 = ෍ 𝑙(𝜃, 𝑥𝑖 , 𝑦𝑖 )
𝜃 𝑛
𝑖=1

subject to: 𝑅 𝜃 ≤ 𝑟
Regularization as soft constraint
• Unconstrained optimization
𝑛
1
min 𝐿෠ 𝑅 𝜃 = ෍ 𝑙(𝜃, 𝑥𝑖 , 𝑦𝑖 ) + 𝜆𝑅(𝜃)
𝜃 𝑛
𝑖=1
for some regularization parameter 𝜆 > 0
Regularization as Bayesian prior
• Bayesian rule:
𝑝 𝜃 𝑝 𝑥𝑖 , 𝑦𝑖 𝜃)
𝑝 𝜃 | {𝑥𝑖 , 𝑦𝑖 } =
𝑝({𝑥𝑖 , 𝑦𝑖 })

• Maximum A Posteriori (MAP):

max log 𝑝 𝜃 | {𝑥𝑖 , 𝑦𝑖 } = max log 𝑝 𝜃 + log 𝑝 𝑥𝑖 , 𝑦𝑖 | 𝜃
𝜃 𝜃

Regularization MLE loss

Classical regularizations
• Norm penalty
• 𝑙2 regularization
• 𝑙1 regularization
More examples
Other types of regularizations
• Robustness to noise
• Noise to the input
• Noise to the weights
• Noise to the output
• Data augmentation
• Early stopping
• Dropout
Multiple optimal solutions?

Class +1
𝑤1 𝑤2 𝑤3

Class -1

Prefer 𝑤2 (higher confidence)

Add noise to the input

Class +1
𝑤2

Class -1

Prefer 𝑤2 (higher confidence)

Caution: not too much noise
Too much noise leads
to data points cross
the boundary

Class +1
𝑤2

Class -1

Prefer 𝑤2 (higher confidence)

Equivalence to weight decay
• Suppose the hypothesis is 𝑓 𝑥 = 𝑤 𝑇 𝑥, noise is 𝜖~𝑁(0, 𝜆𝐼)
• After adding noise, the loss is

2
𝐿(𝑓) = 𝔼𝑥,𝑦,𝜖 𝑓 𝑥 + 𝜖 − 𝑦 = 𝔼𝑥,𝑦,𝜖 𝑓 𝑥 + 𝑤 𝑇 𝜖 − 𝑦 2

2
𝐿(𝑓) =𝔼𝑥,𝑦,𝜖 𝑓 𝑥 − 𝑦 + 2𝔼𝑥,𝑦,𝜖 𝑤 𝑇 𝜖 𝑓 𝑥 − 𝑦 + 𝔼𝑥,𝑦,𝜖 𝑤 𝑇 𝜖 2

2 2
𝐿(𝑓) =𝔼𝑥,𝑦,𝜖 𝑓 𝑥 − 𝑦 +𝜆 𝑤
Add noise to the weights
• For the loss on each data point, add a noise term to the weights
before computing the prediction

𝜖~𝑁(0, 𝜂𝐼), 𝑤′ = 𝑤 + 𝜖

• Prediction: 𝑓𝑤 ′ 𝑥 instead of 𝑓𝑤 𝑥
• Loss becomes
𝐿(𝑓) = 𝔼𝑥,𝑦,𝜖 𝑓𝑤+𝜖 𝑥 − 𝑦 2
Add noise to the weights
• Loss becomes
𝐿(𝑓) = 𝔼𝑥,𝑦,𝜖 𝑓𝑤+𝜖 𝑥 − 𝑦 2

• To simplify, use Taylor expansion

𝑇 𝜖𝑇 𝛻2 𝑓 𝑥 𝜖
• 𝑓𝑤+𝜖 𝑥 ≈ 𝑓𝑤 𝑥 + 𝜖 𝛻𝑓 𝑥 +
2
• Plug in
2
• 𝐿 𝑓 ≈ 𝔼 𝑓𝑤 𝑥 − 𝑦 + 𝜂𝔼[ 𝑓𝑤 𝑥 − 𝑦 𝛻 2 𝑓𝑤 𝑥 ] + 𝜂𝔼||𝛻𝑓𝑤 (𝑥)||2

Small so can be ignored Regularization term

Data augmentation

Figure from Image Classification with Pyramid Representation

and Rotated Data Augmentation on Torch 7, by Keven Wang
Data augmentation
• Adding noise to the input: a special kind of augmentation

• Be careful about the transformation applied:

• Example: classifying ‘b’ and ‘d’
• Example: classifying ‘6’ and ‘9’
Early stopping
• Idea: don’t train the network to too small training error

• Recall overfitting: Larger the hypothesis class, easier to find a

hypothesis that fits the difference between the two

• Prevent overfitting: do not push the hypothesis too much; use

validation error to decide when to stop
Early stopping

Figure from Deep Learning,

Goodfellow, Bengio and Courville
Early stopping
• When training, also output validation error
• Every time validation error improved, store a copy of the weights
• When validation error not improved for some time, stop
• Return the copy of the weights stored
Early stopping
• hyperparameter selection: training step is the hyperparameter

• Advantage
• Efficient: along with training; only store an extra copy of weights
• Simple: no change to the model/algo

• Disadvantage: need validation data

Early stopping
• Strategy to get rid of the disadvantage
• After early stopping of the first run, train a second run and reuse validation
data

• How to reuse validation data

1. Start fresh, train with both training data and validation data up to the
previous number of epochs
2. Start from the weights in the first run, train with both training data and
validation data util the validation loss < the training loss at the early
stopping point
Early stopping as a regularizer

Figure from Deep Learning,

Goodfellow, Bengio and Courville
Dropout
• Randomly select weights to update

• More precisely, in each update step

• Randomly sample a different binary mask to all the input and hidden units
• Multiple the mask bits with the units and do the update as usual

• Typical dropout probability: 0.2 for input and 0.5 for hidden units
Dropout

Figure from Deep Learning,

Goodfellow, Bengio and Courville
Dropout

Figure from Deep Learning,

Goodfellow, Bengio and Courville
Dropout

Figure from Deep Learning,

Goodfellow, Bengio and Courville
What regularizations are frequently used?
• 𝑙2 regularization
• Early stopping
• Dropout

• Data augmentation if the transformations known/easy to implement

Minor Project Synopsis - Dog Breed Identification (1) - Removed
No ratings yet
Minor Project Synopsis - Dog Breed Identification (1) - Removed
42 pages
ECE604 f20 hw3
0% (1)
ECE604 f20 hw3
3 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
Understanding Loss & Regularization in Deep Learning
No ratings yet
Understanding Loss & Regularization in Deep Learning
19 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Regularization
No ratings yet
Regularization
74 pages
Deep Learning Regularization Guide
No ratings yet
Deep Learning Regularization Guide
77 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
L1, L2andBatchnormalization (1) T1754749408264
No ratings yet
L1, L2andBatchnormalization (1) T1754749408264
9 pages
Week 10
No ratings yet
Week 10
69 pages
Regularization
No ratings yet
Regularization
46 pages
LECTURE#9 EE258 F22 Part2 Draft v1
No ratings yet
LECTURE#9 EE258 F22 Part2 Draft v1
14 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
Unit - 4-NNDL - Notes
No ratings yet
Unit - 4-NNDL - Notes
14 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
DL Class3
No ratings yet
DL Class3
28 pages
Unit-2 L2
No ratings yet
Unit-2 L2
22 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
Unit Iv NNHDL
No ratings yet
Unit Iv NNHDL
15 pages
NN&DL Unit-IV Regularization For Deep Learning
No ratings yet
NN&DL Unit-IV Regularization For Deep Learning
16 pages
U4 PDF
No ratings yet
U4 PDF
18 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Regularization
No ratings yet
Regularization
9 pages
Module 3 - 3
No ratings yet
Module 3 - 3
93 pages
DL 3 Regularization
No ratings yet
DL 3 Regularization
50 pages
Deep Learning Regularization Guide
No ratings yet
Deep Learning Regularization Guide
68 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Regularization
No ratings yet
Regularization
19 pages
DL Unit 1
No ratings yet
DL Unit 1
5 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
Cours 4
No ratings yet
Cours 4
30 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
CM20315 09 Regularization
No ratings yet
CM20315 09 Regularization
44 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
Overfitting Solutions in Machine Learning
No ratings yet
Overfitting Solutions in Machine Learning
7 pages
Lec8 Regularization
No ratings yet
Lec8 Regularization
41 pages
Deep Learning Regularization Lecture
No ratings yet
Deep Learning Regularization Lecture
79 pages
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
No ratings yet
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
20 pages
DL Unit 3
No ratings yet
DL Unit 3
59 pages
Regularization in ML
No ratings yet
Regularization in ML
2 pages
FDL Module2
No ratings yet
FDL Module2
37 pages
Regularization in ML Models
No ratings yet
Regularization in ML Models
47 pages
DL Unit 4
No ratings yet
DL Unit 4
15 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
OSRAM SFH 309 Datasheet
No ratings yet
OSRAM SFH 309 Datasheet
16 pages
Backpropagation Lecture Notes
No ratings yet
Backpropagation Lecture Notes
31 pages
Deep Learning Basics Lecture 8 Autoencoder & DBM
No ratings yet
Deep Learning Basics Lecture 8 Autoencoder & DBM
28 pages
Deep Learning Basics Lecture 6 Convolutional NN
No ratings yet
Deep Learning Basics Lecture 6 Convolutional NN
36 pages
Motion of Charged Particles in Fields: 2.1 Uniform B Field, E 0
No ratings yet
Motion of Charged Particles in Fields: 2.1 Uniform B Field, E 0
23 pages
Deep Learning Basics Lecture 1 Feedforward
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
31 pages
Deep Learning Basics Lecture 11 Practical Methodology
No ratings yet
Deep Learning Basics Lecture 11 Practical Methodology
25 pages
ECE604 f20 hw1
No ratings yet
ECE604 f20 hw1
1 page
SFH 203 - en
No ratings yet
SFH 203 - en
15 pages
Lectures On Electromagnetic Theory - Weng Cho Chew
No ratings yet
Lectures On Electromagnetic Theory - Weng Cho Chew
591 pages
SFH 235 Fa - en
No ratings yet
SFH 235 Fa - en
15 pages
PYu-RC Group 51 RoHS L 12
No ratings yet
PYu-RC Group 51 RoHS L 12
10 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
3 pages
Adaptive Elastic-Net Estimation For Sparse Diffusion Processes
No ratings yet
Adaptive Elastic-Net Estimation For Sparse Diffusion Processes
36 pages
AL3451 - Unit 1
No ratings yet
AL3451 - Unit 1
12 pages
JAL Big Data Auditing Submitted Version Free To Share
No ratings yet
JAL Big Data Auditing Submitted Version Free To Share
41 pages
Peerj Cs 1481
No ratings yet
Peerj Cs 1481
22 pages
Lecture 11 AGD Restart Lower Bounds
No ratings yet
Lecture 11 AGD Restart Lower Bounds
5 pages
Data Science in Python - Regression
100% (1)
Data Science in Python - Regression
234 pages
CS229 Linear Algebra Review
No ratings yet
CS229 Linear Algebra Review
47 pages
EPGP in Data Science Gen AI PDF
No ratings yet
EPGP in Data Science Gen AI PDF
63 pages
Unit 3
No ratings yet
Unit 3
21 pages
Unit 3
No ratings yet
Unit 3
47 pages
Probabilistic Matrix Factorization: Ruslan Salakhutdinov and Andriy Mnih
No ratings yet
Probabilistic Matrix Factorization: Ruslan Salakhutdinov and Andriy Mnih
8 pages
Physics-Based Learning Models For Ship Hydrodynamics
No ratings yet
Physics-Based Learning Models For Ship Hydrodynamics
22 pages
Towards Lifelong Learning of Large Language Models: A Survey
No ratings yet
Towards Lifelong Learning of Large Language Models: A Survey
37 pages
Deep Learning Exam: Key Concepts
No ratings yet
Deep Learning Exam: Key Concepts
32 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
Neuralsympcheck: A Symptom Checking and Disease Diagnostic Neural Model With Logic Regularization
No ratings yet
Neuralsympcheck: A Symptom Checking and Disease Diagnostic Neural Model With Logic Regularization
12 pages
Made Easy
No ratings yet
Made Easy
11 pages
Graph-Based Semi-Supervised Learning - A Comprehensive Review
No ratings yet
Graph-Based Semi-Supervised Learning - A Comprehensive Review
24 pages
Deep Learning in Drug Discovery
No ratings yet
Deep Learning in Drug Discovery
12 pages
Machine Learning Refined: Foundations, Algorithms, and Applications Second Edition Jeremy Watt 2024 Scribd Download
100% (1)
Machine Learning Refined: Foundations, Algorithms, and Applications Second Edition Jeremy Watt 2024 Scribd Download
55 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Sales-Forecasting of Retail Stores Using Machine Learning Techniques
No ratings yet
Sales-Forecasting of Retail Stores Using Machine Learning Techniques
7 pages
NMF for Audiovisual Analysis Experts
No ratings yet
NMF for Audiovisual Analysis Experts
189 pages
3.1.1weight Decay, Weight Elimination, and Unit Elimination: GX X X X, Which Is Plotted in
No ratings yet
3.1.1weight Decay, Weight Elimination, and Unit Elimination: GX X X X, Which Is Plotted in
26 pages
Neural Network Loss & Regularization
No ratings yet
Neural Network Loss & Regularization
112 pages
Deep Learning Sem 5
No ratings yet
Deep Learning Sem 5
3 pages
A Proposal On Machine Learning Via Dynamical Systems
No ratings yet
A Proposal On Machine Learning Via Dynamical Systems
11 pages

Deep Learning Basics Lecture 4 Regularization II

Uploaded by

Deep Learning Basics Lecture 4 Regularization II

Uploaded by

Deep Learning Basics

• Maximum A Posteriori (MAP):

Regularization MLE loss

Prefer 𝑤2 (higher confidence)

Prefer 𝑤2 (higher confidence)

Prefer 𝑤2 (higher confidence)

• To simplify, use Taylor expansion

Small so can be ignored Regularization term

Figure from Image Classification with Pyramid Representation

• Be careful about the transformation applied:

• Recall overfitting: Larger the hypothesis class, easier to find a

• Prevent overfitting: do not push the hypothesis too much; use

Figure from Deep Learning,

• Disadvantage: need validation data

• How to reuse validation data

Figure from Deep Learning,

• More precisely, in each update step

Figure from Deep Learning,

Figure from Deep Learning,

Figure from Deep Learning,

• Data augmentation if the transformations known/easy to implement

You might also like