We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 33
AD3501 DEEP LEARNING
UNIT Il CONVOLUTIONAL NEURAL NETWORKS
Convolution Operation -- Sparse Interactions -- Parameter Sharing --
Equivariance -- Pooling -- Convolution Variants: Strided -- Tiled -- Transposed
and dilated convolutions; CNN Learning: Nonlinearity Functions -- Loss
Functions -- Regularization -- Optimizers --Gradient Computation.
CONVOLUTION NEURAL NETWORK
Convolution Pooling Convolution Pooling Fully Fully Output
Ret Connected Connected perdictions
Convolutional networks belong to a class of neural networks that take the image as an input,
subjects it to combinations of weights and biases, extracts feature and outputs the results.
They tend to reduce the dimensions of the input image with the use of a kernel which makes
it easier to extract features as compared to a generic Dense neural network. Convolutional
networks trace their foundation to convolution operations on matrices.1. CONVOLUTION OPERATION
The name “Convolutional neural network" indicates that the network employs a mathematical
operation called Convolution. Convolution is a specialized kind of linear operation. Convnets
are simply neural networks that use convolution in place of general matrix multiplication in at
least one of their layers.
Convolution between two functions in mathematics produces a third function expressing how
the shape of one function is modified by other.
Convolution Kernels
A keel is a small 2D matrix whose contents are based upon the operations to be
performed. A kernel maps on the input image by simple matrix multiplication and addition,
the output obtained is of lower dimensions and therefore easier to work with.
Original Gaussian Blur ‘Sharpen Edge Detection
000 1 121 o -1 0 -1 -1 -1
010 Te 242 -1 5 -l -1 8 -1
00 0 121 o-1 0 -1 -1 -1
Figure 1 Kernel TypesAbove is an example of a kernel for applying Gaussian blur(to smoothen the image before
processing), Sharpen image(enhance the depth of edges) and edge detection.
The shape of a kernel is heavily dependent on the input shape of the image and
architecture of the entire network, mostly the size of kernels is (MxM) i.e a square matrix.
The movement of a kernel is always from left to right and top to bottom.
Figure 2 Kernel Movement
Stride defines by what step does to kernel move, for example stride of 1 makes kernel slide
by one row/column at a time and stride of 2 moves kernel by 2 rows/columns.G
Figure 3 : Multiple Kernels with Stride =1
For input images with 3 or more channels such as RGB a filter is applied. Filters are one
dimension higher than kernels and can be seen as multiple kernels stacked on each other
where every kernel is for a particular channel. Therefore for an RGB image of (32x32) we have
a filter of the shape say (5x5x3)
Now let's see how a kernel operates on sample matrixFigure 4 : Operation of Kernel
Here the input matrix has shape 4x4x1 and the kemel is of size 3x3 since the shape of input
is larger than the kernel, we are able to implement a sliding window protocol and apply the
kernel over entire input. First entry in the convoluted result is calculated as:
45*0 + 12*(-1) + 5#0 + 22%(-1) + 10*5 + 35*(-1) + 88*0 + 264(-1) + 5140 = -45Sliding window protocol:
1. The kernel gets into position at the top-left corner of the input matrix.
2. Then it starts moving left to right, calculating the dot product and saving it to a new
matrix until it has reached the last column.
3. Next, kemel resets its position at first column but now it slides one row to the bottom.
Thus following the fashion left-right and top-bottom.
4, Steps 2 and 3 are repeated till the entire input has been processed.
For a 3D input matrix the movement of the kernel will be from front to back, left to right and
top to bottom,
2. Motivation behind Convolution
Convolution leverages three important ideas that motivated computer
vision researchers:
sparse interaction,
parameter sharing, and
equivariant representa’
Let's describe each one of them in detail.
Trivial neural network layers use matrix multiplication by a matrix of parameters describing
the interaction between the input and output unit. This means that every output unit interacts
with every input unit. However, convolution neural networks have sparse interaction. This
is achieved by making kernel smaller than the input eg, an image can have millions or
thousands of pixels, but while processing it using kernel we can detect meaningful
information that is of tens or hundreds of pixels. This means that we need to store fewer
parameters that not only reduces the memory requirement of the model but also improves
the statistical efficiency of the model.If computing one feature at a spatial point (x1, y1) is useful then it should also be useful at
some other spatial point say (x2, y2). It means that for a single two-dimensional slice i.e, for
creating one activation map, neurons are constrained to use the same set of weights. In a
traditional neural network, each element of the weight matrix is used once and then never
revisited, while convolution network has shared parametersie., for getting output,
weights applied to one input are the same as the weight applied elsewhere.
Due to parameter sharing, the layers of convolution neural network will have a property
of equivariance to translation. It says that if we changed the input in a way, the output
will also get changed in the same way.
Sparse connectivity due to Image Convolution
+ Input image may have millions of pixels,
+ But we can detect edges with kernels of
hundreds of pixels
+ If we limit no of connections for each input to k
+ we need kxn parameters and O(kXn)
runtime
+ Ibis possible to get good performance
with keen
Cete lemrt othe kere spaced over he
+ Convolutional networks have
sparse interactions
+ Accomplished by making the
kernel smaller than the inputDrawbacks of Sparse Connectivity
Sparse Connectivity, viewed from below
+ Highlight one input x3 and
output units s affected by it
* Top: when s is formed by
convolution with a kernel of
width 3, only three outputs are
affected by x3
+ Bottom: when s is formed by
matrix multiplication
connectivity is no longer
sparse
+ So all outputs are affected by x3
Sparse Connectivity, viewed from above
* Highlight one output s3 and inputs x that affect this unit
+ These units are known as the receptive field of s
When s; is formed by convolution
with a kernel of width 3 CIXLXD
When s; is formed by
matrix multiplicationKeeping up performance with reduced connections
* It is possible to obtain good performance while keeping k
several magnitudes lower than m
+ In adeep neural network, units in deeper layers may indirectly
interact with a larger portion of the input
+ Receptive Field in Deeper layers is larger than the receptive field of units
in shallow layers
+ This allows the network to efficiently describe complicated
interactions between many variables from simple building _
blocks that only describe sparse| interactions
Parameter Sharing
+ Parameter sharing refers to using the same parameter for
more than one function in a model
+ Ina traditional neural net each element of the weight
matrix is used exactly once when computing the output of
a layer
+ Itis multiplied by one element of the input and never revisited
+ Parameter sharing is synonymous with tied weights
+ Value of the weight applied to one input is tied to a weight applied
elsewhere
+ Ina Convolutional net, each member of the kernel is used
in every position of the input (except at the boundary-
subject to design decisions)Efficiency of Parameter Sharing
+ Parameter sharing by convolution operation means
that rather than learning a separate set of parameters
for every location, we learn only one set
* This does not affect runtime of forward propagation—
which is still O(k x n)
+ But further reduces the storage requirements to k
parameters
* kis orders of magnitude less than m
* Since m and n are roughly the same size k is much smaller
than mxn
How parameter sharing works
* Black arrrows: connections that use a particular parameter
1. Convolutional model: Black arrows indicate uses of the central element
— § 5886
2. Fully connected model: Single black arrow indicates use of the central
element of the weight matrix
* Model has no parameter sharing, so the parameter is used only once
O0.0.0C-9
ORORORORO)