Week 6
Convolutional Neural Network
Convolutional Neural Networks (Convnets/CNNs)
• Convolutional neural networks allow deep networks to
learn functions on structured spatial data such as
images, video, and text.
• Convolutional networks provide tools for exploiting the
local structure of data effectively.
• Convolutional Neural Network (CNN) forms the basis of
computer vision and image processing
Basic Architecture: Convolutional neural networks, also
known as CNNs, are a specific type of neural network that
are generally composed of the following layers:
• Convolution Layer
• Pooling Layer
• Dense Layer
Example CNN Architecture : VGG 16
Network
Feature Extraction Classifier
Input - RGB Image
Image Credit: https://neurohive.io/en/popular-networks/vgg16/
Applications
Convolutional
Neural
Network
CNN specifically tailored for image
and video processing tasks
Self Driving cars
• Object Detection, Image Segmentation
Medical Image Analysis
• Identifying tumours in MRI scans or analyzing X-rays and CT scans.
Style Transfer
Image Generation (Generative
models)
Why CNN? Why not Fully connected NN?
• Reduced number of parameters (A huge number
of Parameter used in FNN, whereas in CNN we
have shared weight)
• Convolutions are computationally efficient,
especially when optimized for parallel processing
on GPUs.
• Loss of information in FNN
• Global vs Local - Local Receptive Fields in
CNN(Spatial Relationships)
Local Receptive Fields in CNN
• Reducing the number of connections
• Shared weights
• visual features from a small region of image to generate
feature map
The same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
CNN : Captures Patterns in Different locations
The same pattern appears in different places, Neurons in convolution
layers learn to respond to specific patterns in small local regions.
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector
They can be compressed
to the same parameters.
“middle beak”
detector
Translation Invariance
CNN produces the same response, regardless of how its input is
shifted.
Spatial Hierarchies
* *
• Convolutional layers are responsible for detecting local features in
the input, such as edges, corners, and textures.
• When we move deeper into the network, neurons in the
convolutional layers recognize more abstract and larger-scale
features.
Limitations of CNN
• Small changes in image orientation, scale, or
position(Lack of Spatial Invariance)
• Bias
• CNNs require large amount of labelled data for
training to achieve high performance
• Extensive computation & Memory constraints
• Limitation with 3D and Temporal Data
Bias and Fairness Issues
A CNN trained on facial recognition data may have biased performance across different
ethnicities, performing better for some groups than others, leading to fairness concerns.
Article: https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html
A convolutional layer
• A CNN is a neural network with
some convolutional layers (and
some other layers).
• A convolutional layer has several
filters (Kernels) that do
convolutional operations.
• Sliding the filter over the input
image and computing the dot
product at every position, the
network learns to identify features
such as edges, shapes, and
Filter /Kernels textures, which are crucial for
understanding the content of the
image.
• Convolution operation is a
mathematical process that preserves
the spatial relationship between
pixels
Convolution in CNN
Filters contain parameters that are learned
during training
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0 Each filter detects a
small pattern (3 x 3).
0 0 1 0 1 0
6 x 6 image
Convolution is performed on the input data using filter/kernel to
produce a feature map
Convolution : (Its sometimes easy to understand
visually)
Convolution : (Its sometimes easy to understand
visually)
......
Convolution in CNN
Cross-Correlation Output from convolution
filter 1 – feature map
1 0 0 0 0 1 Filter 1 /activation map
0 1 0 0 1 0 1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0
∗ -1 1 -1
-1 -1 1
0 1 0 0 1 0
0 0 1 0 1 0
Each filter detects a
6 x 6 image small pattern (3 x 3).
h x w image f x f filter (h- f+1) x (w-f+1)
These are the network Output
parameters to be learned.
Convolution in CNN
Output from convolution
1 -1 -1 filter 1,2 – feature map
-1 1 -1 Filter 2 /activation map
1 0 0 0 0 1
-1 -1 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
∗
1 -1 -1
0 1 0 0 1 0 -1 1 -1 Filter 1
0 0 1 0 1 0 -1 -1 1
6 x 6 image
Each filter detects a small
pattern (3 x 3).
(h- f+1) x (w-f+1) x 2
h x w x 1 image f x f x 2 filter
Convolution in CNN
0 -1 1
Filter n
-1 0 -1
1 0 0 0 0 1 0 -2 1
.. ..
0 1 0 0 1 0 . .
0 0 1 1 0 0
1 0 0 0 1 0
∗ 0 -1 1
Filter 2
-1 0 -1
0 1 0 0 1 0 1-1 -1
-1 -11 Filter 1
0 0 1 0 1 0 -1 1 -1
6 x 6 image -1 -1 1
Each filter detects a small
pattern (3 x 3).
(h- f+1) x (w-f+1) x (nf)
h x w x nc image f x f x nf filter Output
How it works on colored images: RGB 3
channels
RGB image 1 0 0 0 0 1
0 0 0 0 0 1
0 1 0 0 10 0 1
0 1 0 0 1 0 -1 0 -1
0 0 1 10 0 01 0 1 -1 -1
0 0 1 1 0 0 -1 0 -1
1 0 0 01 1 0 0
1 0 0 0 1 0 ∗ -1 1
-1
-1
0 -1
0 1 0 0 10 01 0 -1 -1 1
0 1 0 0 1 0
0 0 11 0 10 01 0
0 0 1 0 1 0
0 0 1 0 1 0
Filter 1 Output Filter 2 Output
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
x36
Filters Sobel or Edge detector
Filters Sobel or Edge detector
Filters Sobel or Edge detector
detecting Vertical edges.
Filters Sobel or Edge detector
detecting horizontal edges.
Convolution Calculation with 1 Filter
-1 0 1 Filter Output
-1 0 1 0 30 30 0
-1 0 1 0 30 30 0
0 30 30 0
0 0 0 10 10 10 𝑜𝑢𝑡𝑝𝑢𝑡 0,0
0 ∗ −1 + 0∗0+0∗1+ 0 ∗ −1 + 0∗0+0∗1+ 0 ∗ −1
0 0 0 10 10 10 + 0∗0+0∗1
=0
0 0 0 10 10 10
0 0 0 10 10 10 𝑜𝑢𝑡𝑝𝑢𝑡 0,1
0 ∗ −1 + 0∗0+30∗1+ 0 ∗ −1 + 0∗0+30∗1+ 0 ∗ −1 + 0∗0+30∗1
0 0 0 10 10 10 =30
5 x 5 image