Deep Learning with Keras and
TensorFlow
Convolutional Neural Net (CNN)
Learning Objectives
By the end of this lesson, you will be able to:
Implement CNN architecture
Implement Deep CNN
Optimize CNNs using pooling layers
Success and History
Human Visual and CNN
LGN
V1
IT V4 V2
Edges
And lines
Faces Shapes
And objects
▪ The idea of CNNs was neurobiologically motivated by the findings of locally-sensitive and orientation-selective nerve
cells in the visual cortex.
▪ Inventors of CNN designed a network structure that implicitly extracts relevant features.
▪ Convolutional Neural Networks are a special kind of multilayer neural networks.
History of CNN
Success
Stories In 1995, Yann LeCun,
professor of computer
science at the New York
University, introduced the
concept of convolutional
neural networks.
The Core Idea Behind CNN
Local Connections
Layering
Spatial Invariance Represent how each set of neurons in a cluster is
connected to each other, which in turn represents a
set of features
The Core Idea Behind CNN
Local Connections
Layering
Spatial Invariance
Represents the hierarchy in features that are learned
The Core Idea Behind CNN
Local Connections
Layering
Spatial Invariance
Represents the capability of CNNs to learn abstractions
invariant of size, contrast, rotation, and variation
Few Popular CNNs
LeNet, 1998
AlexNet, 2012
VGGNet, 2014
ResNet, 2015
CNN Architectures
VGGNet
16 layers
Only 3*3 convolutions
138 million parameters
ResNet
152 layers
ResNet50
CNN Applications
Input A Task A
Layer n
AnB : Frozen Weights
Back-Propagation
Input B Task B
Back-Propagation
Transfer Learning and Fine Tuning Feature Extraction
Working of CNNs
Learning an Image
CNN focuses on smaller and specific patterns than the whole image.
Output
“beak” detector
It’s convenient and effective to represent a smaller region with fewer parameters, thereby reducing
computational complexity.
The Convolutional Layer
A CNN is a neural network with convolutional layers (and other layers). A convolutional layer has
several filters that perform the convolution operation.
Beak
Detector
Filter
The Convolution Operation
Consider a 6x6 image convolved with 3x3 filter(s) to give an output of size 4x4.
1 0 0 0 0 1 1 -1 -1
0 1 0 0 1 0 -1 1 -1 Filter 1
*
0 0 1 1 0 0 -1 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 -1 1 -1
0 0 1 0 1 0 -1 1 -1 Filter 2
-1 1 -1
6 x 6 image
…
…
Each filter detects a small
pattern (3 x 3)
Note: Filters can be considered network parameters to be learned.
The Convolution Operation
Shift the filter around the input matrix (commonly known as stride) once a convolved output is achieved.
Stride=1
1 0 0 0 0 1 1 -1 -1
Dot product -1 1 -1 Filter 1
0 1 0 0 1 0
0 0 1 1 0 0 -1 -1 1
1 0 0 0 1 0
0 1 0 0 1 0
3 -1
0 0 1 0 1 0
6 x 6 image
The Convolution Operation
Stride=2
1 0 0 0 0 1 1 -1 -1
Dot product -1 1 -1 Filter 1
0 1 0 0 1 0
0 0 1 1 0 0 -1 -1 1
1 0 0 0 1 0
0 1 0 0 1 0
3 -3
0 0 1 0 1 0
6 x 6 image
Note: If you change the stride size, the convolved output will vary (only outputting intense pixels).
The Convolution Operation
Stride=1
1 0 0 0 0 1 1 -1 -1
Dot product -1 1 -1 Filter 1
0 1 0 0 1 0
0 0 1 1 0 0 -1 -1 1
1 0 0 0 1 0
0 1 0 0 1 0
3 -1 -3 -1
0 0 1 0 1 0
-3 1 0 -3
6 x 6 image
-3 -3 0 1
3 -2 -2 -1
4 x 4 image
The Convolution Operation
The convolution operation gets repeated for each filter resulting in a feature map.
Stride=1
1 0 0 0 0 1 -1 1 -1
Dot product
0 1 0 0 1 0 -1 1 -1 Filter 2
0 0 1 1 0 0 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
3 -1 -3 -1
0 0 1 0 1 0 -1 -1 -1 -1
-3 1 0 -3
-1 -1 -2 1
6 x 6 image Feature
-3 -3 Map0 1
-1 -1 -2 1
3 -2 -2 -1 Two 4 x 4 images
-1 0 -4 3
Forming 2 x 4 x 4 matrix
RGB Images
When RGB image is used as input to CNN, the depth of filter is always equal to the depth
of image (3 in case of RGB).
1 -1 -1 -1 1 -1
1 1 -1-1 -1-1 -1 1 -1
-1 1 -1 -1 -1 1 1 -1 -1
-1-1 1 1 -1-1 -1 1 -1
-1 -1 1 -1 -1 1 1 -1 -1
-1-1 -1-1 1 1 -1 1 -1
-1 1 -1
3-dimensional 3-dimensional
filter 1 filter 2
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
CNN
Problem Scenario: Consider the MNIST dataset from the previous lesson wherein, you were
hired by one of the major AI giants planning to build the best image classifier model available till
date. Also, to do so, you used a multilayered neural network. However, Keras being the most
commonly used libraries for deep learning, you would have to use Keras this time.
Objective:
Build a Keras-based image classification model on the MNIST dataset.
Access: Click the Practice Labs tab on the left panel. Now, click on the START LAB button and wait
while the lab prepares itself. Then, click on the LAUNCH LAB button. A full-fledged jupyter lab
opens, which you can use for your hands-on practice and projects.
Pooling
Pooling Layer
The pooling layer gradually reduces the spatial size of each matrix within the feature map such that the
amount of parameters and computation is reduced in the network.
Subsampling
Note: The most commonly used pooling approach is max pooling.
Pooling Layer
Stride=1
1 0 0 0 0 1 1 -1 -1
Dot product -1 1 -1 Filter 1
0 1 0 0 1 0
0 0 1 1 0 0 -1 -1 1
1 0 0 0 1 0
0 1 0 0 1 0
3 -1 -3 -1
0 0 1 0 1 0
-3 1 0 -3 Max pool (filter 2x2, stride =2) 3 0
3 1
6 x 6 image
-3 -3 0 1
3 -2 -2 -1
4 x 4 image
Note: The most commonly used pooling approach is max pooling.
The CNN Architecture
The CNN architecture comprises multiple combinations of convolution and pooling layers.
Convolution
Max Pooling
Resultant image is smaller than
the original image
Convolution
Max Pooling
The CNN Architecture
The reduced image from these layers (convolution + pooling) is then passed through the activation function.
Convolution vs. Fully-Connected Networks
Convolution vs. Fully-Connected Networks
1 0 0 0 0 1 1 -1 -1 -1 1 -1 x1
0 1 0 0 1 0 -1 1 -1 -1 1 -1 1 0 0 0 0 1
0 0 1 1 0 0 -1 -1 1 -1 1 -1 0 1 0 0 1 0 x2
0 0 1 1 0 0
1 0 0 0 1 0 1 0 0 0 1 0
……
0 1 0 0 1 0
……
0 1 0 0 1 0
0 0 1 0 1 0 0 0 1 0 1 0
x36
Image Convolution Fully-connected Network
Fewer Parameters
The CNN below is only connected to 9 inputs (not fully connected).
1 -1 -1 1 1
Filter 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
1 0 0 0 0 1
…
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0
10: 0
0 0 1 0 1 0
…
13 0
6 x 6 image
14 0
15 1
16 1
…
Fewer Parameters
The number of parameters are reduced even further after the first stride.
1 -1 -1 1 1
Filter 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
1 0 0 0 0 1
…
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 -1
10: 0
0 0 1 0 1 0
…
13 0
6 x 6 image
14 0
15 1
16 1
…
Deep Convolutional Models
Multilayered CNN
Deep nets fine-tune the features learned by the previous layers.
30 parameters 23 parameters
Deep CNN: Example
Filter GoogLeNet
concatenation
3x3 convolutions 5x5 convolutions 1x1 convolutions
1x1 convolutions
1x1 convolutions 1x1 convolutions 3x3 max pooling
Previous layer
ILSVRC 2014 Winner
Deeper Is Better
Key Takeaways
Now, you are able to:
Implement CNN architecture
Implement Deep CNN
Optimize CNNs using pooling layers
Knowledge Check
Knowledge
Check The input image has been converted into a matrix of size 30 X 30 and a kernel/filter of
size 7 X 7 with a stride of 1. What will be the size of the convoluted matrix?
1
a. 24 x 24
b. 21 x 21
c. 28 x 28
d. 7x7
Knowledge
Check The input image has been converted into a matrix of size 30 X 30 and a kernel/filter of
size 7 X 7 with a stride of 1. What will be the size of the convoluted matrix?
1
a. 24 x 24
b. 21 x 21
c. 28 x 28
d. 7x7
The correct answer is a
The size of the convoluted matrix is given by C=((I-F+2P)/S)+1, where C is the size of the Convoluted matrix, I is the size of the
input image, F the size of the filter and P the padding applied to the input matrix. Here P=0, I=30, F=7 and S=1.
Knowledge
Check
Which of the following do you typically see in a ConvNet?
2
a. Multiple pool layers followed by a CONV layer
b. Multiple CONV layers followed by a pool layer
c. FC layers in the first few layers
d. All the above
Knowledge
Check
Which of the following do you typically see in a ConvNet?
2
a. Multiple pool layers followed by a CONV layer
b. Multiple CONV layers followed by a pool layer
c. FC layers in the first few layers
d. All the above
The correct answer is b
A typical/deep ConvNet usually comprises of multiple convolutional layers followed by a pool layer.
Image Classification
Problem Statement: Asirra (Animal Species Image Recognition for Restricting Access) is a HIP
(Human Interactive Proof) that works by asking users to identify photographs of cats and dogs.
This task is difficult for computers, but studies have shown that people can accomplish it
quickly and accurately.
Hint: Use the dataset folder provided with csv files for importing training and testing sets. Also,
use cat.jpg to validate your model.
Objective: To write an algorithm to classify whether images contain either a dog or a cat. (Use
Keras for this task).
Access: Click the Practice Labs tab on the left panel. Now, click on the START LAB button and
wait while the lab prepares itself. Then, click on the LAUNCH LAB button. A full-fledged jupyter
lab opens, which you can use for your hands-on practice and projects.
Thank You