0% found this document useful (0 votes)

18 views37 pages

TRes Net

Resnet notes

Uploaded by

Satyarth Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views37 pages

TRes Net

Resnet notes

Uploaded by

Satyarth Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

ResNet

Some slides were adated/taken from various sources, including Andrew Ng’s Coursera Lectures, CS231n: Convolutional Neural Networks for Visual Recognition lectures, Stanford University CS
Waterloo Canada lectures, Aykut Erdem, et.al. tutorial on Deep Learning in Computer Vision, Ismini Lourentzou's lecture slide on "Introduction to Deep Learning", Ramprasaath's lecture
slides, and many more. We thankfully acknowledge them. Students are requested to use this material for their study only and NOT to distribute it.
In this Lecture

• Introducing a breakthrough neural networks architecture introduced

on 2015.
• Why deep?
• What’s the problem in learning deep networks?
• ResNet and how it allow us to gain more performance via deeper
networks.
• Some results, improvements and farther works.

Technical ResNet
Intro ResNet Results Comparison
details 1000
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners

Technical ResNet
Intro ResNet Results Comparison
details 1000
Deep vs Shallow Networks
What happens when we continue stacking deeper layers on a “plain” convolutional
neural network?

56-layer
Training error

56-layer

Test error
20-layer

20-layer

Iterations Iterations

56-layer model performs worse on both training and test error

-> The deeper model performs worse, but it’s not caused by overfitting!
Technical ResNet
Intro ResNet Results Comparison
details 1000
Deeper models are harder to optimize

• The deeper model should be able to perform at least as well as the

shallower model.
• A solution by construction is copying the learned layers from the
shallower model and setting additional layers to identity mapping.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Challenges

• Deeper Neural Networks start to degrade in performance.

• Vanish/Exploding Gradient – May lead for extremely complex
parameters initializations to make it work. Still might suffer from
Vanish/Exploding even for the best parameters.
• Long training times – Due to too many training parameters.

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet

• A specialized network introduced by Microsoft.

• Connects inputs of layers into farther part of that network to allow
“shortcuts”.
• Simple idea – great improvements with both performance and train
time.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Plain Network

Technical ResNet
Intro ResNet Results Comparison
details 1000
Residual Blocks

Technical ResNet
Intro ResNet Results Comparison
details 1000
Residual Blocks
X Big NN a[l]

a[l] a[l+2]
X Big NN

a[l+2]=g(z[l+2]+a[l])
=g(w[l+2] a[l+2]+b[l+2] +a[l])=g(a[l])
if w[l+2]=0 and b[l+2] =0
Identity function is easy to learn for residual block
Skip Connections “shortcuts”

• Such connections are referred as skipped connections or shortcuts. In

general similar models could skip over several layers.
• They refer to residual part of the network as a unit with input and
output.
• Such residual part receives the input as an amplifier to its output –
The dimensions usually are the same.
• Another option is to use a projection to the output space.
• Either way – no additional training parameters are used.

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet
He et. al. 2015

F(x) = w[l+2] a[l+2]+b[l+2] So H(x) of plain layers is replaced

x = a[l] by new H(x) = w[l+2] a[l+2]+b[l+2] + a[l]
g() is ReLU
Slide Credit: Fei Li et. al.
ResNet
He et. al. 2015

Referring the original residual function as H(x)

The residual part now fits a new function F(x)= H(x)-X
The original mapping recast into old H(x)+X
It is easier to learn residual F(x) Slide Credit: Fei Fei Li et. al.
ResNet as a ConvNet
• Till now we talked about fully connected layers.
• The ResNet idea could easily expended into convolutional model.
• Other adaptations of this idea could be easily introduced to almost
any kind of deep layered network.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128
3x3 conv 3x3 conv, 128

F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2

3x3 conv, 64
3x3 conv, 64

3x3 conv, 64
X 3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

3x3 conv, 128
filters and downsample F(x) X 3x3 conv, 128
filters, /2
relu 3x3 conv, 128
spatially with
identity
spatially using stride 2 3x3 conv, 128 stride 2
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64 3x3 conv, 64
3x3 conv, 64
filters
3x3 conv, 64
X 3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
spatially using stride 2 3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64
- Additional conv layer at 3x3 conv, 64

the beginning X
3x3 conv, 64
3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2 Beginning
Input conv layer
Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000 No FC layers
Pool besides FC
ResNet Architecture 3x3 conv, 512
3x3 conv, 512
1000 to
output
classes
3x3 conv, 512
3x3 conv, 512
Full ResNet architecture: Global
relu 3x3 conv, 512 average
- Stack residual blocks 3x3 conv, 512, /2 pooling layer
F(x) + x after last
- Every residual block has ..
conv layer
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
spatially using stride 2 3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64
- Additional conv layer at 3x3 conv, 64

the beginning X
3x3 conv, 64
3x3 conv, 64

- No FC layers at the end Residual block 3x3 conv, 64

(only FC 1000 to output 3x3 conv, 64

Pool
classes) 7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512, /2
Total depths of 34, 50, 101, or
..
152 layers for ImageNet .
3x3 conv, 128
3x3 conv, 128

3x3 conv, 128

3x3 conv, 128, / 2

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
ResNet Architecture
28x28x256
output

For deeper networks 1x1 conv, 256

(ResNet-50+), use “bottleneck”
layer to improve efficiency 3x3 conv, 64
(similar to GoogLeNet)
1x1 conv, 64

28x28x256
input

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet Architecture
28x28x256
output
1x1 conv, 256 filters projects
back to 256 feature maps
For deeper networks (28x28x256) 1x1 conv, 256
(ResNet-50+), use “bottleneck”
layer to improve efficiency 3x3 conv operates over
3x3 conv, 64
(similar to GoogLeNet) only 64 feature maps

1x1 conv, 64 filters 1x1 conv, 64

to project to
28x28x64 28x28x256
input

Technical ResNet
Intro ResNet Results Comparison
details 1000
Residual Blocks (skip connections)
Deeper Bottleneck Architecture

Technical ResNet
Intro ResNet Results Comparison
details 1000
Deeper Bottleneck Architecture (Cont.)
• Addresses high training time of very deep networks.
• Keeps the time complexity same as the two layered convolution
• Allows us to increase the number of layers
• allows the model to converge much faster.
• 152-layer ResNet has 11.3 billion FLOPS while VGG-16/19 nets has
15.3/19.6 billion FLOPS.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Why Do ResNets Work Well?

Technical ResNet
Intro ResNet Results Comparison
details 1000
Why Do ResNets Work Well? (Cont
(Cont)
Cont)
• In theory ResNet is still identical to plain networks, but in practice due to
the above the convergence is much faster.
• No additional training parameters introduced.
• No addition complexity introduced.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Training ResNet in practice
• Batch Normalization after every CONV layer.
• Xavier/2 initialization from He et al.
• SGD + Momentum (0.9)
• Learning rate: 0.1, divided by 10 when validation error
plateaus.
• Mini-batch size 256.
• Weight decay of 1e-5.
• No dropout used.
Technical ResNet
Intro ResNet Results Comparison
details 1000
Loss Function
• For measuring the loss of the model a combination of cross-entropy
and softmax were used.
• The output of the cross-entropy was normalized using softmax
function.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Results
Experimental Results
- Able to train very deep
networks without degrading
(152 layers on ImageNet, 1202
on Cifar)
- Deeper networks now achieve
lowing training error as
expected
- Swept 1st place in all ILSVRC
and COCO 2015 competitions ILSVRC 2015 classification winner (3.6%
top 5 error) -- better than “human
performance”! (Russakovsky 2014)
Technical ResNet
Intro ResNet Results Comparison
details 1000
Comparing Plain to ResNet (18/
18/34 Layers)

Technical ResNet
Intro ResNet Results Comparison
details 1000
Comparing Plain to Deeper ResNet
Test Error: Train Error:

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet on More than 1000 Layers
• To farther improve learning of extremely deep ResNet “Identity
Mappings in Deep Residual Networks Kaiming He, Xiangyu Zhang,
Shaoqing Ren, and Jian Sun 2016” suggests to pass the input directly
to the final residual layer, hence allowing the network to easily learn
to pass the input as identity mapping both in forward and backward
passes.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Identity Mappings in Deep Residual Networks

Technical ResNet
Intro ResNet Results Comparison
details 1000
Identity Mappings in Deep Residual Networks
Improvement on CIFAR-
CIFAR-10

• Another important improvement – using the Batch

Normalization as pre-activation improves the
regularization.
• This improvement leads to better performances for
smaller networks as well.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Reduce Learning Time with Random Layer
Drops
• Dropping layers during training, and using the full network in testing.
• Residual block are used as network’s building block.
• During training, input flows through both the shortcut and the weights.
• Training: Each layer has a “survival probability” and is randomly dropped.
• Testing: all blocks are kept active.
• re-calibrated according to its survival probability during training.

Technical ResNet
Intro ResNet Results Comparison
details 1000

Sony Ps3 Controller
33% (3)
Sony Ps3 Controller
12 pages
Pvs4 Information
No ratings yet
Pvs4 Information
110 pages
Shares and Dividends
53% (17)
Shares and Dividends
4 pages
LEED AP ID+C Candidate Handbook
No ratings yet
LEED AP ID+C Candidate Handbook
32 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
CNN Architectures - Transfer Learning
No ratings yet
CNN Architectures - Transfer Learning
64 pages
L3 - UUCLxDeepMind DL2020
No ratings yet
L3 - UUCLxDeepMind DL2020
110 pages
Resnets: Background
No ratings yet
Resnets: Background
8 pages
Deep Residual Learning For Image Recognition (Summary)
No ratings yet
Deep Residual Learning For Image Recognition (Summary)
11 pages
Kappu Potet'o Brief Background of The Business
No ratings yet
Kappu Potet'o Brief Background of The Business
3 pages
GoogleNET and ResNet v4 With Nin and Bias
No ratings yet
GoogleNET and ResNet v4 With Nin and Bias
82 pages
Apex Freebitcoin High Odds Long Runner Intelligent Bot
No ratings yet
Apex Freebitcoin High Odds Long Runner Intelligent Bot
16 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
ResNet Overview
No ratings yet
ResNet Overview
17 pages
CNN Architectures for Text and Image
No ratings yet
CNN Architectures for Text and Image
167 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
VGG and Resnet
No ratings yet
VGG and Resnet
18 pages
Farmakoterapi Stroke
No ratings yet
Farmakoterapi Stroke
33 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
17 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Densely Connected Convolutional Networks
No ratings yet
Densely Connected Convolutional Networks
11 pages
Fog Computing: Survey of Trends, Architectures, Requirements, and Research Directions
No ratings yet
Fog Computing: Survey of Trends, Architectures, Requirements, and Research Directions
31 pages
Unit-2 Adl
No ratings yet
Unit-2 Adl
25 pages
5b Dana
No ratings yet
5b Dana
67 pages
ResNet: Deep Learning with Skip Connections
No ratings yet
ResNet: Deep Learning with Skip Connections
8 pages
Chuks
No ratings yet
Chuks
4 pages
Volume 3 ENG
0% (1)
Volume 3 ENG
475 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
138 B Pretrained Networks Classification Complete
No ratings yet
138 B Pretrained Networks Classification Complete
47 pages
Guidance Transcutaneous Electrical Stimulators
No ratings yet
Guidance Transcutaneous Electrical Stimulators
18 pages
TC74VHC240F, TC74VHC240FK TC74VHC244F, TC74VHC244FK
No ratings yet
TC74VHC240F, TC74VHC240FK TC74VHC244F, TC74VHC244FK
10 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
Famous Networks
No ratings yet
Famous Networks
6 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Case Studies
No ratings yet
Case Studies
17 pages
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
Difference Between AlexNet, VGGNet, ResNet, and Inception
No ratings yet
Difference Between AlexNet, VGGNet, ResNet, and Inception
25 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
ResNet & VGGNet Deep Learning Guide
No ratings yet
ResNet & VGGNet Deep Learning Guide
44 pages
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
Difference Between Alexnet, Vggnet, Resnet, and Inception
No ratings yet
Difference Between Alexnet, Vggnet, Resnet, and Inception
14 pages
Lin's Concordance Correlation Coefficient
No ratings yet
Lin's Concordance Correlation Coefficient
7 pages
CNNs for Image Recognition
No ratings yet
CNNs for Image Recognition
17 pages
HRM: Job Analysis Essentials
100% (1)
HRM: Job Analysis Essentials
11 pages
CNN (1) - Unit 3 - Merged
No ratings yet
CNN (1) - Unit 3 - Merged
14 pages
Conflict Style Self-Assessment
No ratings yet
Conflict Style Self-Assessment
2 pages
Tax Updates for Business Owners
No ratings yet
Tax Updates for Business Owners
61 pages
DL Ass 742
No ratings yet
DL Ass 742
14 pages
Unit 3
No ratings yet
Unit 3
38 pages
Deep Learning Assign 2
No ratings yet
Deep Learning Assign 2
5 pages
CNN Fundamentals & Case Studies
No ratings yet
CNN Fundamentals & Case Studies
27 pages
Unit 3
No ratings yet
Unit 3
37 pages
Alex Net
No ratings yet
Alex Net
26 pages
Escp European Standard Clinical Practice Recommendations For Non Hodgkin Lymphoma of Childhood and
No ratings yet
Escp European Standard Clinical Practice Recommendations For Non Hodgkin Lymphoma of Childhood and
45 pages
DeltaX - Product Analyst - Job Description - Campus Hiring 2025
No ratings yet
DeltaX - Product Analyst - Job Description - Campus Hiring 2025
3 pages
Mos Word 2016 - Core Practice Exam 3 Training
No ratings yet
Mos Word 2016 - Core Practice Exam 3 Training
9 pages
Understanding ResNet
No ratings yet
Understanding ResNet
11 pages
Financial Systems & Cheque Clearing
No ratings yet
Financial Systems & Cheque Clearing
5 pages
Legal Frameworks & Judicial Notice
100% (1)
Legal Frameworks & Judicial Notice
7 pages
Acquiring Skills in Basketball Through Observational Learning
No ratings yet
Acquiring Skills in Basketball Through Observational Learning
19 pages
Ca Inter FM List of Important Concepts & List of Important Questions
No ratings yet
Ca Inter FM List of Important Concepts & List of Important Questions
5 pages
Residual Networks
No ratings yet
Residual Networks
13 pages
19 ResNet 10 09 2024
No ratings yet
19 ResNet 10 09 2024
35 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
No ratings yet
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
4 pages
Legal Procedures for Suits
No ratings yet
Legal Procedures for Suits
13 pages
Reading Unit 4
No ratings yet
Reading Unit 4
3 pages
Essential of Financial Accounting
No ratings yet
Essential of Financial Accounting
8 pages
1.2. Free Radical Bromination of Alkanes - Master Organic Chemistry
No ratings yet
1.2. Free Radical Bromination of Alkanes - Master Organic Chemistry
1 page
465-Lecture 7
No ratings yet
465-Lecture 7
46 pages
Deep CNN
No ratings yet
Deep CNN
66 pages
Lecture 6 Review
No ratings yet
Lecture 6 Review
74 pages
Res Netdetaila
No ratings yet
Res Netdetaila
24 pages
PP Riseofchina
No ratings yet
PP Riseofchina
16 pages
Res Net
No ratings yet
Res Net
13 pages
DL UNIT 2 CNN Architectures
No ratings yet
DL UNIT 2 CNN Architectures
12 pages
Convnets 3
No ratings yet
Convnets 3
17 pages
Cours 8 B
No ratings yet
Cours 8 B
39 pages
Modules 3 C
No ratings yet
Modules 3 C
44 pages
CSCI417 Machine Intelligence - Lec11 RNN - V1
No ratings yet
CSCI417 Machine Intelligence - Lec11 RNN - V1
61 pages
ResNet Architecture
No ratings yet
ResNet Architecture
4 pages
Lec 6
No ratings yet
Lec 6
31 pages
Avik Chakraborty MCAN-302
No ratings yet
Avik Chakraborty MCAN-302
11 pages
CNN Basic
No ratings yet
CNN Basic
64 pages
Evolution of CNN Architecture
No ratings yet
Evolution of CNN Architecture
13 pages
Module - 2
No ratings yet
Module - 2
117 pages
Bascis of AI - Module 2 - Complementary Study Material - 4
No ratings yet
Bascis of AI - Module 2 - Complementary Study Material - 4
4 pages