0 ratings 0% found this document useful (0 votes) 7 views 79 pages CNN 01
Convolutional Neural Networks (CNNs), introduced by Yann LeCun and Yoshua Bengio in 1995, are specialized multi-layer neural networks designed to recognize visual patterns directly from pixel images with minimal preprocessing. They utilize a convolution layer as a core building block, where filters slide over the input image to produce activation maps, allowing for efficient feature extraction. The architecture of CNNs involves multiple layers, including convolution, activation, and pooling layers, which work together to reduce dimensionality and improve classification performance.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
UNIT -3
Convolution Neural NetworkHistory : Convolution Neural Network
In 1995, Yann LeCun and Yoshua Bengio introduced the concept of convolutional
neural networks.
Convolutional Neural Networks are a special kind of multi-layer neural networks.
CNN is a feed-forward network that can extract topological properties from an
image.
Convolutional Neural Networks are designed to recognize visual patterns directly
from pixel images with minimal preprocessing.
They can recognize patterns with extreme variability (such as handwritten
characters).
They are popular because people are achieving state-of-the-art results on
difficult computer vision and natural language processing tasks.Image Size
his digitization process requires decisions about values for M, N, and for
the number, L, of discrete gray levels allowed for each pixel. Where M and
N, are positive integers. However, due to processing, storage, and
sampling hardware considerations, the number of gray levels typically is
an integer power of 2:
L=2k
Where k is number of bits require to represent a grey value
>The discrete levels should be equally spaced and that they are integers in
the interval [0, L-1].
>The number, b, of bits required to store a digitized image is
b=M*N*k,Number of storage bits for various values of N and k
Nik (@=2) 2=4) 3E=8) AL=16) SL=%) 6L= M4) T= 1) 8(L = 28H)
QR 1,024 208 3072 4.096 5,120 6144 7168 8192
644006 SUD 1228S KBE MAY STG NT SLB
1S 16 ATS LIND. RBM IAHR TBL OTE
2% 65K SO M5OUR ALI HUTABD URIS SRT U.S
512 2M SARS T86ARD—LOSTGLSIOTIO 1572841835008 2.007182
IODA 108575 2087,52 {N43 5212880 GIDLA56 TMOG 8388608
28 41943048 388,608 16,777.26 20971520 25,165,824 29,369,128 42
4096 1GTTT216 SRSS4A32 SOS3L ON 67108864 $3885 080 100663296 TITAMOSTD 134217728
8192 G7.10R864 13: 728 592 28435456 33: 0 402,653,184 469,760,048 536.870,912Grayscale ImageFF | RGB Image
~CNN - Convolution Neural Network
Images are stored in a large multidimensional array of pixels, whose dimensions
are (width x height) for greyscale and (width x height x 3) array for RGB images.
So consider a small 256x256 RGB image - this results in 256x256x3= 196608 pixels! Each
pixel is an input feature, which means that even if we only have 128 units in the first
layer, the number of parameters in our weight matrix is 128*196608 = 25 million
weights!
Given that most images are much more high-res than 256x256, the number of input
features will increase dramatically - for a 1024x1024 RGB image the number of pixels
increases by a factor of 16.
This poses a scalability issue - our model has far too many weights! This requires a huge
amount of memory and computational power to train. How then do we train a neural
network on an image?CNN — General Architecture
Fully
Connected
Pooling __-----~" Output
WW )
Feature Extraction Classification
ConvolutionConvolution
h1= f(a wl+b«w2+e*w3 +f *w4)
ees
h2 = fb*wi +c*w2+f*w3+g*w4)
Number of Parameters for one feature map = 4
Number of Parameters for 100 feature map = 4*100Convolution
(4x0)
Cnt cone cite tonsis sad overtta (2
Somavideeerartontet xn
stivaoghed smeftsatanetwenyees. XO
‘Source pixe
Image Source:
https://medium.com/ @behuma/6-basic-things-o-know-about-convolutiondaefSetbed11Convolution Layer
feature map
learned
weights.
image Convolutional layerFilters
Step 1: Move the window to the first location where we want to
compute the average value and then select only pixels
inside the window.
Step 2: Compute
=» on average value
=> y= >>I “plis)
Sub image p minis
Step 3: Place the
Original image result at the pixel
in the output image
Step 4: Move the |__
window to the next
location and go to Step 2 Output imageFilters a
-a]o
ofa
Roberts
Prewitt
of} ofo
2}aConvolution Layer
The Convolution layer is the core building block of a CNN.
The parameters consist of a set of learnable filters.
Every filter is small spatially (width and height), but extends through the full depth of
the input volume, eg, 5x5x3
During the forward pass, we slide (convolve) each filter across the width and height of
the input volume and compute dot products between the entries of the filter and the
input at any position.
Produce a 2-dimensional activation map that gives the responses of that filter at every
spatial position.
Intuitively, the network will learn filters that activate when they see some type of visual
feature
Aset of filters in each CONV layer
* each of them will produce a separate 2-dimensional activation map
* We will stack these activation maps along the depth dimension and produce the
output volume.CNN — General Architecture
Fully
Connected
Pooling __-----~" Output
WW )
Feature Extraction Classification
ConvolutionConvolution Layer
32x32x3 image
_ 5x5x3 filter w
“~ 4 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
(i.e, 5°5*3 = 75-dimensional dot product + bias)
wtbConvolutions: More detail
Convolution Layer
activation map
32x32x3 image
5x5x3 filter
28
convolve (slide) over all
spatial locationsConvolutions: More detail
For example, if we had 6 5xS filters, we'll get 6 separate activation maps:
activation maps
32
Convolution Layer
32
3
We stack these up to get a “new image” of size 28x28x6!Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation
32
functions
28 24
CONY, CONV, CONV,
ReLU ReLU ReLU
eg. 6 eg. 10
5x5x3 5x5x8
filters 28 filters 24om RU RO ROS
foro (ete) | gull or hielo
CONV: Convolutional kernel layer
RELU: Activation function
POOL: Dimension reduction layer
FC: Fully connection layerConvolutions: More detail
Acloser look at spatial dimensions:
{|} 7x7 input (spatially)
frat assume 3x3 filter
=> 5x5 outputConvolutions: More detail
Acloser look at spatial dimensions:
7x7 input (spatially)
| assume 3x3 filter
applied with stride 2
=> 3x3 output!Convolutions: More detail
Acloser look at spatial dimensions:
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
7 doesn't fit!
cannot apply 3x3 filter on
7x7 input with stride 3.How to resolve the issue ?
Output size:
(N - F)/ stride + 4
N eg.N=7,F =3:
stride 1 => (7-3)/1+1=5
stride 2 => (7 -3)/2+1=3
Stride 3 => (7 - 3/3 +1=2.33:+ Padding refers to adding extra layer of zeros across the images so that
the output image has the same size as the input. This is known as same
padding.
+ After the application of filters the convolved layer in the case of same
padding has the size equal to the actual image.
» Valid padding refers to keeping the image as such an having all the pixels
of the image which are actual or “valid”. In this case after the application of
filters the size of the length and the width of the output keeps getting
reduced at each convolutional layer.In practice: Common to zero pad the border
e.g. input 7x7
3x3 filter, applied with stride 1
pad with 1 pixel border => what is the output?
(recall:)
(N- F)/ stride + 1In practice: Common to zero pad the border
ayo ]o:
e| 6] | o| 6
e.g. input 7x7
3x3 filter, applied with stride 1
pad with 1 pixel border => what is the output?
7x7 output!
in general, common to see CONV layers with
Stride 1, filters of size FxF, and zero-padding with
(F-1)/2. (will preserve size spatially)
e.g. F =3 => zero pad with 1
F = 5 => zero pad with 2
F=7 => zero pad with 3
(N + 2*padding - F)/ stride +1Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2
Output volume size:
(32+2*2-5)/1+1 = 32 spatially, so
32x32x10Convolutions: More detail
Examples time:
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2
Number of parameters in this layer?
each filter has 5*5*3 +1= 76 params — (+1 forbias)
=> 76*10 = 760‘Summary. To summarize, the Conv Layer:
Accepts a volume of size W, x Hy x D;
Requires four hyperparameters
© Number of fters K,
© their spatial extent F,
© thestride S,
© the amount of zero padding P.
Produces a volume of size W x Hp x D2 where:
(W, —F+2P)/S+1
With parameter sharing, it introduces F'- F’- D, weights per filter, for a total of (F’- F - D,) - K weights
and K biases:
In the output volume, the d-th depth slice (of size W2 x Hp) is the result of performing a valid convolution
of the d-th filter over the input volume with a stride of 5, and then offset by d-th bias.