Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views79 pages

CNN 01

Convolutional Neural Networks (CNNs), introduced by Yann LeCun and Yoshua Bengio in 1995, are specialized multi-layer neural networks designed to recognize visual patterns directly from pixel images with minimal preprocessing. They utilize a convolution layer as a core building block, where filters slide over the input image to produce activation maps, allowing for efficient feature extraction. The architecture of CNNs involves multiple layers, including convolution, activation, and pooling layers, which work together to reduce dimensionality and improve classification performance.

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
7 views79 pages

CNN 01

Convolutional Neural Networks (CNNs), introduced by Yann LeCun and Yoshua Bengio in 1995, are specialized multi-layer neural networks designed to recognize visual patterns directly from pixel images with minimal preprocessing. They utilize a convolution layer as a core building block, where filters slide over the input image to produce activation maps, allowing for efficient feature extraction. The architecture of CNNs involves multiple layers, including convolution, activation, and pooling layers, which work together to reduce dimensionality and improve classification performance.

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 79
UNIT -3 Convolution Neural Network History : Convolution Neural Network In 1995, Yann LeCun and Yoshua Bengio introduced the concept of convolutional neural networks. Convolutional Neural Networks are a special kind of multi-layer neural networks. CNN is a feed-forward network that can extract topological properties from an image. Convolutional Neural Networks are designed to recognize visual patterns directly from pixel images with minimal preprocessing. They can recognize patterns with extreme variability (such as handwritten characters). They are popular because people are achieving state-of-the-art results on difficult computer vision and natural language processing tasks. Image Size his digitization process requires decisions about values for M, N, and for the number, L, of discrete gray levels allowed for each pixel. Where M and N, are positive integers. However, due to processing, storage, and sampling hardware considerations, the number of gray levels typically is an integer power of 2: L=2k Where k is number of bits require to represent a grey value >The discrete levels should be equally spaced and that they are integers in the interval [0, L-1]. >The number, b, of bits required to store a digitized image is b=M*N*k, Number of storage bits for various values of N and k Nik (@=2) 2=4) 3E=8) AL=16) SL=%) 6L= M4) T= 1) 8(L = 28H) QR 1,024 208 3072 4.096 5,120 6144 7168 8192 644006 SUD 1228S KBE MAY STG NT SLB 1S 16 ATS LIND. RBM IAHR TBL OTE 2% 65K SO M5OUR ALI HUTABD URIS SRT U.S 512 2M SARS T86ARD—LOSTGLSIOTIO 1572841835008 2.007182 IODA 108575 2087,52 {N43 5212880 GIDLA56 TMOG 8388608 28 41943048 388,608 16,777.26 20971520 25,165,824 29,369,128 42 4096 1GTTT216 SRSS4A32 SOS3L ON 67108864 $3885 080 100663296 TITAMOSTD 134217728 8192 G7.10R864 13: 728 592 28435456 33: 0 402,653,184 469,760,048 536.870,912 Grayscale Image FF | RGB Image ~ CNN - Convolution Neural Network Images are stored in a large multidimensional array of pixels, whose dimensions are (width x height) for greyscale and (width x height x 3) array for RGB images. So consider a small 256x256 RGB image - this results in 256x256x3= 196608 pixels! Each pixel is an input feature, which means that even if we only have 128 units in the first layer, the number of parameters in our weight matrix is 128*196608 = 25 million weights! Given that most images are much more high-res than 256x256, the number of input features will increase dramatically - for a 1024x1024 RGB image the number of pixels increases by a factor of 16. This poses a scalability issue - our model has far too many weights! This requires a huge amount of memory and computational power to train. How then do we train a neural network on an image? CNN — General Architecture Fully Connected Pooling __-----~" Output WW ) Feature Extraction Classification Convolution Convolution h1= f(a wl+b«w2+e*w3 +f *w4) ees h2 = fb*wi +c*w2+f*w3+g*w4) Number of Parameters for one feature map = 4 Number of Parameters for 100 feature map = 4*100 Convolution (4x0) Cnt cone cite tonsis sad overtta (2 Somavideeerartontet xn stivaoghed smeftsatanetwenyees. XO ‘Source pixe Image Source: https://medium.com/ @behuma/6-basic-things-o-know-about-convolutiondaefSetbed11 Convolution Layer feature map learned weights. image Convolutional layer Filters Step 1: Move the window to the first location where we want to compute the average value and then select only pixels inside the window. Step 2: Compute =» on average value => y= >>I “plis) Sub image p minis Step 3: Place the Original image result at the pixel in the output image Step 4: Move the |__ window to the next location and go to Step 2 Output image Filters a -a]o ofa Roberts Prewitt of} ofo 2}a Convolution Layer The Convolution layer is the core building block of a CNN. The parameters consist of a set of learnable filters. Every filter is small spatially (width and height), but extends through the full depth of the input volume, eg, 5x5x3 During the forward pass, we slide (convolve) each filter across the width and height of the input volume and compute dot products between the entries of the filter and the input at any position. Produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. Intuitively, the network will learn filters that activate when they see some type of visual feature Aset of filters in each CONV layer * each of them will produce a separate 2-dimensional activation map * We will stack these activation maps along the depth dimension and produce the output volume. CNN — General Architecture Fully Connected Pooling __-----~" Output WW ) Feature Extraction Classification Convolution Convolution Layer 32x32x3 image _ 5x5x3 filter w “~ 4 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e, 5°5*3 = 75-dimensional dot product + bias) wtb Convolutions: More detail Convolution Layer activation map 32x32x3 image 5x5x3 filter 28 convolve (slide) over all spatial locations Convolutions: More detail For example, if we had 6 5xS filters, we'll get 6 separate activation maps: activation maps 32 Convolution Layer 32 3 We stack these up to get a “new image” of size 28x28x6! Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation 32 functions 28 24 CONY, CONV, CONV, ReLU ReLU ReLU eg. 6 eg. 10 5x5x3 5x5x8 filters 28 filters 24 om RU RO ROS foro (ete) | gull or hielo CONV: Convolutional kernel layer RELU: Activation function POOL: Dimension reduction layer FC: Fully connection layer Convolutions: More detail Acloser look at spatial dimensions: {|} 7x7 input (spatially) frat assume 3x3 filter => 5x5 output Convolutions: More detail Acloser look at spatial dimensions: 7x7 input (spatially) | assume 3x3 filter applied with stride 2 => 3x3 output! Convolutions: More detail Acloser look at spatial dimensions: 7x7 input (spatially) assume 3x3 filter applied with stride 3? 7 doesn't fit! cannot apply 3x3 filter on 7x7 input with stride 3. How to resolve the issue ? Output size: (N - F)/ stride + 4 N eg.N=7,F =3: stride 1 => (7-3)/1+1=5 stride 2 => (7 -3)/2+1=3 Stride 3 => (7 - 3/3 +1=2.33: + Padding refers to adding extra layer of zeros across the images so that the output image has the same size as the input. This is known as same padding. + After the application of filters the convolved layer in the case of same padding has the size equal to the actual image. » Valid padding refers to keeping the image as such an having all the pixels of the image which are actual or “valid”. In this case after the application of filters the size of the length and the width of the output keeps getting reduced at each convolutional layer. In practice: Common to zero pad the border e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the output? (recall:) (N- F)/ stride + 1 In practice: Common to zero pad the border ayo ]o: e| 6] | o| 6 e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the output? 7x7 output! in general, common to see CONV layers with Stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F =3 => zero pad with 1 F = 5 => zero pad with 2 F=7 => zero pad with 3 (N + 2*padding - F)/ stride +1 Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: (32+2*2-5)/1+1 = 32 spatially, so 32x32x10 Convolutions: More detail Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer? each filter has 5*5*3 +1= 76 params — (+1 forbias) => 76*10 = 760 ‘Summary. To summarize, the Conv Layer: Accepts a volume of size W, x Hy x D; Requires four hyperparameters © Number of fters K, © their spatial extent F, © thestride S, © the amount of zero padding P. Produces a volume of size W x Hp x D2 where: (W, —F+2P)/S+1 With parameter sharing, it introduces F'- F’- D, weights per filter, for a total of (F’- F - D,) - K weights and K biases: In the output volume, the d-th depth slice (of size W2 x Hp) is the result of performing a valid convolution of the d-th filter over the input volume with a stride of 5, and then offset by d-th bias.

You might also like