Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views33 pages

DL Unit 2

DEEP LEARNING UNIT 2

Uploaded by

sugimpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
18 views33 pages

DL Unit 2

DEEP LEARNING UNIT 2

Uploaded by

sugimpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 33
AD3501 DEEP LEARNING UNIT Il CONVOLUTIONAL NEURAL NETWORKS Convolution Operation -- Sparse Interactions -- Parameter Sharing -- Equivariance -- Pooling -- Convolution Variants: Strided -- Tiled -- Transposed and dilated convolutions; CNN Learning: Nonlinearity Functions -- Loss Functions -- Regularization -- Optimizers --Gradient Computation. CONVOLUTION NEURAL NETWORK Convolution Pooling Convolution Pooling Fully Fully Output Ret Connected Connected perdictions Convolutional networks belong to a class of neural networks that take the image as an input, subjects it to combinations of weights and biases, extracts feature and outputs the results. They tend to reduce the dimensions of the input image with the use of a kernel which makes it easier to extract features as compared to a generic Dense neural network. Convolutional networks trace their foundation to convolution operations on matrices. 1. CONVOLUTION OPERATION The name “Convolutional neural network" indicates that the network employs a mathematical operation called Convolution. Convolution is a specialized kind of linear operation. Convnets are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. Convolution between two functions in mathematics produces a third function expressing how the shape of one function is modified by other. Convolution Kernels A keel is a small 2D matrix whose contents are based upon the operations to be performed. A kernel maps on the input image by simple matrix multiplication and addition, the output obtained is of lower dimensions and therefore easier to work with. Original Gaussian Blur ‘Sharpen Edge Detection 000 1 121 o -1 0 -1 -1 -1 010 Te 242 -1 5 -l -1 8 -1 00 0 121 o-1 0 -1 -1 -1 Figure 1 Kernel Types Above is an example of a kernel for applying Gaussian blur(to smoothen the image before processing), Sharpen image(enhance the depth of edges) and edge detection. The shape of a kernel is heavily dependent on the input shape of the image and architecture of the entire network, mostly the size of kernels is (MxM) i.e a square matrix. The movement of a kernel is always from left to right and top to bottom. Figure 2 Kernel Movement Stride defines by what step does to kernel move, for example stride of 1 makes kernel slide by one row/column at a time and stride of 2 moves kernel by 2 rows/columns. G Figure 3 : Multiple Kernels with Stride =1 For input images with 3 or more channels such as RGB a filter is applied. Filters are one dimension higher than kernels and can be seen as multiple kernels stacked on each other where every kernel is for a particular channel. Therefore for an RGB image of (32x32) we have a filter of the shape say (5x5x3) Now let's see how a kernel operates on sample matrix Figure 4 : Operation of Kernel Here the input matrix has shape 4x4x1 and the kemel is of size 3x3 since the shape of input is larger than the kernel, we are able to implement a sliding window protocol and apply the kernel over entire input. First entry in the convoluted result is calculated as: 45*0 + 12*(-1) + 5#0 + 22%(-1) + 10*5 + 35*(-1) + 88*0 + 264(-1) + 5140 = -45 Sliding window protocol: 1. The kernel gets into position at the top-left corner of the input matrix. 2. Then it starts moving left to right, calculating the dot product and saving it to a new matrix until it has reached the last column. 3. Next, kemel resets its position at first column but now it slides one row to the bottom. Thus following the fashion left-right and top-bottom. 4, Steps 2 and 3 are repeated till the entire input has been processed. For a 3D input matrix the movement of the kernel will be from front to back, left to right and top to bottom, 2. Motivation behind Convolution Convolution leverages three important ideas that motivated computer vision researchers: sparse interaction, parameter sharing, and equivariant representa’ Let's describe each one of them in detail. Trivial neural network layers use matrix multiplication by a matrix of parameters describing the interaction between the input and output unit. This means that every output unit interacts with every input unit. However, convolution neural networks have sparse interaction. This is achieved by making kernel smaller than the input eg, an image can have millions or thousands of pixels, but while processing it using kernel we can detect meaningful information that is of tens or hundreds of pixels. This means that we need to store fewer parameters that not only reduces the memory requirement of the model but also improves the statistical efficiency of the model. If computing one feature at a spatial point (x1, y1) is useful then it should also be useful at some other spatial point say (x2, y2). It means that for a single two-dimensional slice i.e, for creating one activation map, neurons are constrained to use the same set of weights. In a traditional neural network, each element of the weight matrix is used once and then never revisited, while convolution network has shared parametersie., for getting output, weights applied to one input are the same as the weight applied elsewhere. Due to parameter sharing, the layers of convolution neural network will have a property of equivariance to translation. It says that if we changed the input in a way, the output will also get changed in the same way. Sparse connectivity due to Image Convolution + Input image may have millions of pixels, + But we can detect edges with kernels of hundreds of pixels + If we limit no of connections for each input to k + we need kxn parameters and O(kXn) runtime + Ibis possible to get good performance with keen Cete lemrt othe kere spaced over he + Convolutional networks have sparse interactions + Accomplished by making the kernel smaller than the input Drawbacks of Sparse Connectivity Sparse Connectivity, viewed from below + Highlight one input x3 and output units s affected by it * Top: when s is formed by convolution with a kernel of width 3, only three outputs are affected by x3 + Bottom: when s is formed by matrix multiplication connectivity is no longer sparse + So all outputs are affected by x3 Sparse Connectivity, viewed from above * Highlight one output s3 and inputs x that affect this unit + These units are known as the receptive field of s When s; is formed by convolution with a kernel of width 3 CIXLXD When s; is formed by matrix multiplication Keeping up performance with reduced connections * It is possible to obtain good performance while keeping k several magnitudes lower than m + In adeep neural network, units in deeper layers may indirectly interact with a larger portion of the input + Receptive Field in Deeper layers is larger than the receptive field of units in shallow layers + This allows the network to efficiently describe complicated interactions between many variables from simple building _ blocks that only describe sparse| interactions Parameter Sharing + Parameter sharing refers to using the same parameter for more than one function in a model + Ina traditional neural net each element of the weight matrix is used exactly once when computing the output of a layer + Itis multiplied by one element of the input and never revisited + Parameter sharing is synonymous with tied weights + Value of the weight applied to one input is tied to a weight applied elsewhere + Ina Convolutional net, each member of the kernel is used in every position of the input (except at the boundary- subject to design decisions) Efficiency of Parameter Sharing + Parameter sharing by convolution operation means that rather than learning a separate set of parameters for every location, we learn only one set * This does not affect runtime of forward propagation— which is still O(k x n) + But further reduces the storage requirements to k parameters * kis orders of magnitude less than m * Since m and n are roughly the same size k is much smaller than mxn How parameter sharing works * Black arrrows: connections that use a particular parameter 1. Convolutional model: Black arrows indicate uses of the central element — § 5886 2. Fully connected model: Single black arrow indicates use of the central element of the weight matrix * Model has no parameter sharing, so the parameter is used only once O0.0.0C-9 ORORORORO)

You might also like