Lecture 19
Lecture 19
At the heart of our visual identity is the Oxford logo. Quadrangle Logo
It should appear on everything we produce, from
letterheads to leaflets and from online banners to
bookmarks.
This is the square
The primary quadrangle logo consists of an Oxford blue logo of first
(Pantone 282) square with the words UNIVERSITY OF choice or primary
OXFORD at the foot and the belted crest in the top Oxford logo.
right-hand corner reversed out in white.
NOTE
The minimum size for the quadrangle logo and the
rectangle logo is 24mm wide. Smaller versions with The rectangular
Tom Rainforth
Hilary 2022
[email protected]
amples
Complicated Architectures from Simple Building Blocks
1
Ops, Modules, and Factories
2
Ops, Modules, and Factories
2
Ops, Modules, and Factories
3
A Simple Example
3
A Simple Example
One of the simplest and most important kind of modules are those
generated by a linear factory
module ∼ Linear(m, n)
module(x) = W x + b
4
Linear Factories
One of the simplest and most important kind of modules are those
generated by a linear factory
module ∼ Linear(m, n)
module(x) = W x + b
4
Nonlinearity Modules
5
Nonlinearity Modules
5
Nonlinearity Modules
x′i′ j ′ k′ =
d1 d2
max max xi+d1 (i′ −1), j+d2 (j ′ −1), k′
i=1 j=1
1
Tractable here is used in quite a loose sense: some modern incarnations have 50M+ parameters. Nonetheless this
is dwarfed by the current record for transformer–based networks of 1.6 Trillion parameters.
7
Convolutional Neural Networks
1
Tractable here is used in quite a loose sense: some modern incarnations have 50M+ parameters. Nonetheless this
is dwarfed by the current record for transformer–based networks of 1.6 Trillion parameters.
7
Convolutional Neural Networks
1
Tractable here is used in quite a loose sense: some modern incarnations have 50M+ parameters. Nonetheless this
is dwarfed by the current record for transformer–based networks of 1.6 Trillion parameters.
7
Convolutional Neural Networks
1
Tractable here is used in quite a loose sense: some modern incarnations have 50M+ parameters. Nonetheless this
is dwarfed by the current record for transformer–based networks of 1.6 Trillion parameters.
7
Convolutions (Technically Cross Correlations)
1 -1 -1
1 0 0 0 0 1 -1 1 -1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
0 1 0 0 1 0 -1 1 -1
0 0 1 0 1 0 -1 1 -1
…
6 x 6 image …
Credit: https://cs.uwaterloo.ca/~mli/Deep-Learning-2017-Lecture5CNN.ppt
8
Convolutions (Technically Cross Correlations)
1 -1 -1
1 0 0 0 0 1 -1 1 -1 3x3 Filter
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0 1 -1 -1
1 0 0 0 0 1 -1 11 -1-1
-1
1 0 0 0 1 0
0 01 1 0 0 0 0 11 00 -1
-1 1-1 -11
0 00 0 1 1 1 0 01 00 -1 1 -1
1 0 0 0 1 0 -1 1 -1
…
…
6 x 6 image -1 1 -1
0 1 0 0 1 0
0 0 1 0 1 0 -1 1 -1
…
6 x 6 image …
Credit: https://cs.uwaterloo.ca/~mli/Deep-Learning-2017-Lecture5CNN.ppt
8
Convolutions (Technically Cross Correlations)
1 -1 -1
1 0 0 0 0 1 -1 1 -1 3x3 Filter
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0 1 -1 -1
1 0 0 0 0 1 Dot
1 0 0 0 1 0 Product -1 -1 11 -1-1
0 01 0 0
1 0 0 1 0
1 0 -1
-1 1-1 -11-3
3 -1 -1
0 00 0 1 1 1 0 01 00 -1 1 -1
1 0 0 0 1 0 -1 11 -10
-3 -3
…
…
6 x 6 image -1 1 -10
0 1 0 0 1 0 -3 -3 1
0 0 1 0 1 0 -1 1 -1
3 -2 -2 -1
…
6 x 6 image …
Credit: https://cs.uwaterloo.ca/~mli/Deep-Learning-2017-Lecture5CNN.ppt
8
Convolutions (Technically Cross Correlations)
1 -1 -1
1 0 0 0 0 1 -1 1 -1 3x3 Filter
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0 1 -1 -1
1 0 0 0 0 1 Dot
1 0 0 0 1 0 Product -1 -1 11 -1-1
0 01 0 0
1 0 0 1 0
1 0 -1
-1 1-1 -11-3
3 -1 -1
0 00 0 1 1 1 0 01 00 -1 1 -1
1 0 0 0 1 0 -1 11 -10
-3 -3
…
…
6 x 6 image -1 1 -10
0 1 0 0 1 0 -3 -3 1
0 0 1 0 1 0 -1 1 -1
3 -2 -2 -1
…
6 x 6 image …
Credit: https://cs.uwaterloo.ca/~mli/Deep-Learning-2017-Lecture5CNN.ppt
8
Convolutions (Technically Cross Correlations)
1 -1 -1
1 0 0 0 0 1 -1 1 -1 3x3 Filter
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0 1 -1 -1
1 0 0 0 0 1 Dot
1 0 0 0 1 0 Product -1 -1 11 -1-1
0 01 0 0
1 0 0 1 0
1 0 -1
-1 1-1 -11-3
3 -1 -1
0 00 0 1 1 1 0 01 00 -1 1 -1
1 0 0 0 1 0 -1 11 -10
-3 -3
…
…
6 x 6 image -1 1 -10
0 1 0 0 1 0 -3 -3 1
0 0 1 0 1 0 -1 1 -1
3 -2 -2 -1
…
6 x 6 image …
Credit: https://cs.uwaterloo.ca/~mli/Deep-Learning-2017-Lecture5CNN.ppt
8
Convolutions (Technically Cross Correlations)
1 -1 -1
1 0 0 0 0 1 -1 1 -1 3x3 Filter
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0 1 -1 -1
1 0 0 0 0 1 -1 11 -1-1
-1
1 0 0 0 1 0
0 01 1 0 0 0 0 11 00 -1
-1 1-1 -11-3
3 -1 -1
0 00 0 1 1 1 0 01 00 -1 1 -1
1 0 0 0 1 0 Dot
-1
-3 11 -10 -3
…
…
6 x 6 image -1 1 -10
0 1 0 0 1 0 Product -3 -3 1
0 0 1 0 1 0 -1 1 -1
3 -2 -2 -1
…
6 x 6 image …
4x4 Convolution
Credit: https://cs.uwaterloo.ca/~mli/Deep-Learning-2017-Lecture5CNN.ppt
8
Convolutions for Image Processing
Convolutions for Image Processing
Image Convolution Examples
9
Convolutions for Image Processing
Convolutions for Image Processing
Image
Image Convolution
Convolution Examples
Examples
Image Convolution Examples
Gaussian
Convolution
* =
9
Convolutions for Image Processing
Convolutions for Image Processing
Image
ImageConvolution
ConvolutionExamples
Examples
Emboss
Filter
2 3
2 1 0
*
4 1
0
0 15
1 2
=
http://setosa.io/ev/image-kernels
9
Convolutions for Image Processing
2 3
0 1 0
4 1 5 15
0 1 0
9
Convolutions for Image Processing
Sharpen Blue
Channel
2 3
0 1 0
4 1 5 15
0 1 0
9
Convolutional Layers
3D Convolution
3D Convolution
Image Convolution Examples
Image Convolution Examples
3D Convolution
3D Convolution
3D Convolution
http://setosa.io/ev/image-kernels
10
Convolutional Layers
3D Convolution
3D Convolution
Image Convolution Examples
Image Convolution Examples
3D Convolution
3D Convolution
3D Convolution
http://setosa.io/ev/image-kernels
Learn
Input Layer
filters Hidden Layer 1 (before
applying activations)
10
L̂(W, ) = ` X ,Y , W,
B n=1
Convolutions as Sparse Connections
Update the network parameters
W W ⌘1 rW L̂(W, )
⌘2 r L̂(W, )
We can think of a convolution as⌘1aand
where sparse matrix
⌘2 are step sizes multiplication
So how come this is a neural net?
with shared parameters 2
2 1 0
3
4 1 0 15
0 1 2
Fully connected layer
2 3
w11 w12 w13 w14 w15
6w21 w22 w23 w24 w25 7
6 7
6w31 w32 w33 w34 w35 7
4w w w w w45 5
41 42 43 44
w51 w52 w53 w54 w55
11
L̂(W, ) = ` X ,Y , W,
B n=1
Convolutions as Sparse Connections
Update the network parameters
W W ⌘1 rW L̂(W, )
⌘2 r L̂(W, )
We can think of a convolution as⌘1aand
where sparse matrix
⌘2 are step sizes multiplication
So how come this is a neural net?
with shared parameters 2
2 1 0
3
4 1 0 15
0 1 2
Fully connected layer
2 3
w11 w12 w13 w14 w15
6w21 w22 w23 w24 w25 7
6 7
6w31 w32 w33 w34 w35 7
4w w w w w45 5
41 42 43 44
w51 w52 w53 w54 w55
2
Convolutional layer
2 3
w2 w3 0 0 0
6w1 w2 w3 0 07
6 7
6 0 w1 w2 w3 07 (
40 0 w w w3 5
1 2
0 0 0 w1 w2
2
Convolutional layer
2 3
w2 w3 0 0 0
6w1 w2 w3 0 07
6 7
6 0 w1 w2 w3 07 (
40 0 w w w3 5
1 2
0 0 0 w1 w2
2
Convolutional layer
2 3
w2 w3 0 0 0
6w1 w2 w3 0 07
6 7
6 0 w1 w2 w3 07 (
40 0 w w w3 5
1 2
0 0 0 w1 w2
12
Convolutional Modules
12
Convolutional Modules
Convolutional Networks
The traditional CNN setup has a(Convnets)
mixture of convolutional and max
pooling layers to learn features, before finishing with one or more
fully connected layers to do the final prediction2
2
Some modern large CNNs forgo the pooling layers 13
SB2b/SM4 - Deep Learning ywteh
CNNs
Convolutional Networks
The traditional CNN setup has a(Convnets)
mixture of convolutional and max
pooling layers to learn features, before finishing with one or more
fully connected layers to do the final prediction2
“upper-left
beak” detector
“middle beak”
detector
15
Example Architecture: GoogleNet
Example: GoogleNet
[Szegedy et al 2014]
Going Deeper With Convolutions. Szegedy et al. CVPR 2015
16
Getting Too Deep: ResNets and DenseNets
17
Getting Too Deep: ResNets and DenseNets
17
Getting Too Deep: ResNets and DenseNets
18
Recurrent Neural Networks (RNNs)
18
Recurrent Neural Networks (RNNs)
18
Recurrent Neural Networks (RNNs)
18
Recurrent Neural Networks (RNNs)
18
Recurrent Neural Networks (RNNs)
18
Recurrent Neural Networks (RNNs)
19
Basic RNN Framework
ht = fe (ht−1 , xt )
19
Basic RNN Framework
ht = fe (ht−1 , xt )
19
Basic RNN Framework
ht = fe (ht−1 , xt )
fe ∼ MLP
fd ∼ MLP
h1 = fe (0, x1 )
ht = fe (ht−1 , xt ) ∀t ∈ {2, . . . , τ }
ŷt = fd (ht ) ∀t ∈ {1, . . . , τ }
20
Bidirectional RNN
21
Bidirectional RNN
21
Bidirectional RNN
21
Dealing with Varying Input and Output Lengths
22
Dealing with Varying Input and Output Lengths
Credit: https://r2rt.com/written-memories-understanding-deriving-and-extending-the-lstm.html
23
Further Reading
24
Further Reading
24
Further Reading
24
Further Reading
24
Further Reading
24
Further Reading