Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
2 views35 pages

Unit 2 Part 01

Uploaded by

harshithkataray1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views35 pages

Unit 2 Part 01

Uploaded by

harshithkataray1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Convolution Neural Networks

• Introduction

• Relation of convolutional network with deep learning

• A Convolutional Neural Network (ConvNet/CNN) is a Deep


Learning algorithm which can take in an input image, assign
importance (learnable weights and biases) to various
aspects/objects in the image and be able to differentiate one
from the other.

• The pre-processing required in a ConvNet is much lower as


compared to primitive methods where filters are hand-
engineered.

• ConvNets can learn these filters/characteristics of their own


automatically.
Convolution Neural Networks
• The architecture of a ConvNet is inspired by the organization of
the Visual Cortex of the human brain.

• Individual neurons respond to stimuli only in a restricted


region of the visual field known as the Receptive Field.

• WHAT IS CONVOLUTIONAL NEURAL NETWORK?

• Convolutional Neural Network is one of the main category of


deep learning to do image classification and image recognition
in neural networks.

• Scene labelling, objects detections, and face recognition, etc.,


are some of the areas where convolutional neural networks are
widely used.
Architecture of CNN
• The architecture of a ConvNet is inspired by the organization
of the Visual Cortex and is akin to the connectivity pattern of
Neurons in the Human Brain.

• A ConvNet is able to successfully capture the Spatial and


Temporal dependencies in an image through the application
of relevant filters.

• The architecture performs a better fitting to the image


dataset due to the reduction in the number of parameters
involved and reusability of weights.

• In other words, the network can be trained to understand the


sophistication of the image better.
Architecture of CNN
• Figure below shows an RGB image which has been
separated by its three color planes — Red, Green, and
Blue.

• There are several such color spaces in which images exist —


Grayscale, RGB, HSV, CMYK, etc.
Architecture of CNN
• When the dimensions of the image is increased, it becomes
computationally complex to handle such images.

• This is critical for designing an architecture that is


capable of learning features and also being scalable to
large datasets.

• The ConvNet's job is to compress the images into a format


that is easier to process while preserving elements that
are important for obtaining a decent prediction.

• Figure below shows an CNN architecture used to


overcome the problems of high dimensional image dataset.
Architecture of CNN

• As shown in Figure above, CNN takes an image as input, which


is classified and processed under a certain category such as car,
truck, van, etc.

• The computer sees an image as an array of pixels and


depending on the resolution and features of the image the
classes are classified.
Architecture of CNN
• Based on image resolution, it will see as h * w * d, where
h= height w= width and d= dimension.

• For example, An RGB image is 6 * 6 * 3 array of the


matrix, and the grayscale image is 4 * 4 * 1 array of the
matrix.

• Consider, a simple Convolutional neural network as shown


below which involves various components that are
associated to process an input image.
Architecture of CNN

• Figure: Convolution Neural Network for handwritten digit.


Architecture of CNN
• The typical CNN is made of a combination of four main
layers:

1. Convolutional layers

2. Rectified Linear Unit (ReLU for short)

3. Pooling layers

4. Fully connected layers

• But, there a few things to learn from layer 1 that


is striding (stride) and padding.

• Consider an input matrix of 5×5 and a filter of matrix 3x3.


Architecture of CNN
• a filter is a set of weights in a matrix applied on an
image or a matrix to obtain the required features.

• A filter can be of any depth, if a filter is having a


depth d it can go to a depth of d layers and convolute
i.e sum all the (weights x inputs) of d layers.

• Considering the input of size 5×5 and after applying a


3×3 kernel or filters we obtain a 3×3 output feature
map as shown in Figure below:
Architecture of CNN

• The formula to know the output size of the feature map given
the input size and filter dimensions.

• Consider the square input image (same width and height) and
a square filter (same width and height), we can an simplify
the formulas as:
Architecture of CNN

• Where InputSize and FilterSize represent both the width


and height of the input image and filter, respectively.
• Padding is the number of pixels added to the input image
around its border to control the spatial dimensions of the
output.
• Stride is the step size or the number of pixels the filter
shifts (moves) after each convolution operation.
Architecture of CNN
1. Convolution layers

• Convolution layer is the first building block in CNN. The main


mathematical task performed is called convolution.

• Convolution is the application of a sliding window function to a


matrix of pixels called the convolution.

• The sliding function applied to the matrix is called kernel or


filter, and both can be used interchangeably.

• In the convolution layer, several filters of equal size are


applied, and each filter is used to recognize a specific pattern
from the image, such as the curving of the digits, the edges, the
whole shape of the digits etc.
Architecture of CNN
• Example:

• Consider this 32x32 grayscale image of a handwritten


digit. The values in the matrix are given in the pixel
representation as shown in Figure below:
Architecture of CNN
• We consider a kernel or filter used for convolution which is a
matrix with a dimension of 3x3.

• Zero weights are represented in the black grids and ones in


the white grid.

• The weights of the kernels are determined during the


training process of the neural network.

• The convolution operation is applied between the two


matrices (input image matrix and 3x3 kernel) by taking the
dot product, and work as follows:
Architecture of CNN
1. Apply the kernel matrix from the top-left corner and move
column wise to the right.

2. Perform element-wise multiplication.

3. Sum the values of the products.

4. The resulting value corresponds to the first value (top-left corner)


in the convoluted matrix.

5. Move the kernel down with respect to the size of the sliding
window.

6. Repeat from step 1 to 5 until the image matrix is fully covered.

• The dimension of the convoluted matrix depends on the size of the


sliding window. The higher the sliding window, the smaller the
Architecture of CNN
• The operation of convolution is shown in Figure below:

• The weight matrix behaves like a filter in an image extracting


particular information from the original image matrix.
Architecture of CNN
• The weights are learnt so that that loss function is minimized
and also to extract the features from the original image which
helps the network to make correct predictions.

• When we have multiple convolutional layers, the initial layer


extract the more generic features, while deeper layers extract
more features from the more complex problems.

2. Activation function (Rectified Linear Unit (ReLU for short))

• A ReLU activation function is applied after each convolution


operation.

• This function helps the network learn non-linear relationships


between the features in the image
Architecture of CNN
• Hence, ReLU make the network more robust for
identifying different patterns.
• It also helps to mitigate the vanishing gradient
problems.
3. Pooling layer
• The goal of the pooling layer is to pull the most
significant features from the convoluted matrix.
• This is done by applying some aggregation
operations, which reduces the dimension of the
feature map (convoluted matrix), hence reducing the
memory used while training the network.
Architecture of CNN
• Pooling also helps for mitigating overfitting issue.

• The most common aggregation functions that can be applied


are:

• Max pooling which is the maximum value of the feature map.

• Sum pooling corresponds to the sum of all the values of the


feature map.

• Average pooling is the average of all the values.

• For example

• The dimension of the feature map becomes smaller as the


polling function is applied in the example below:
Architecture of CNN

4. Fully connected layers


• The convolution and pooling layers only extract features and
reduce the number of parameters from the original images.
• To generate the final output we need to apply a fully
connected layer the generate an output equal to the number of
classes needed.
Architecture of CNN
• The fully connected layers are in the last layer of the
convolutional neural network.

• We need to flatten the output of the convolutional and pooling


layers and pass it to a series of fully-connected or dense layers.

• The dense layers of the CNN take an input vector of the


flattened pixels of the image and generate the output as
whether or not the image belongs to a particular class.

• The output layer has a loss function like categorical cross-


entropy, to compute the error in prediction.

• Once the forward pass is complete the backpropagation begins


to update the weight and biases for minimizing the error and
loss.
Architecture of CNN
• Finally, a softmax prediction layer is used to generate
probability values for each of the possible output labels, and the
final label predicted is the one with the highest probability
score.

• Dropout

• Dropout is a regularization technique applied to improve the


generalization capability of the neural networks with a large
number of parameters.

• It consists of randomly dropping some neurons during the


training process, which forces the remaining neurons to learn
new features from the input data.
Why convnets are better than feed-forward neural
nets?
• There are several reasons why CNNs are important and
better like:

1. Unlike traditional machine learning models like SVM


and decision trees that require manual feature
extractions, CNNs can perform automatic feature
extraction at scale, making them efficient.

2. The convolutions layers make CNNs translation


invariant, meaning they can recognize patterns from
data and extract features regardless of their position,
whether the image is rotated, scaled, or shifted.
Why convnets are better than feed-forward neural
nets?
3. Multiple pre-trained CNN models such as VGG-16, ResNet50,
Inceptionv3, and EfficientNet are proved to have reached state-of-
the-art results and can be fine-tuned on news tasks using a
relatively small amount of data.

4. CNNs can also be used for non-image classification problems and are
not limited to natural language processing, time series analysis, and
speech recognition.

• ConvNets are better than feed-forward neural nets since CNN has
features parameter sharing and dimensionality reduction.

• By parameter sharing, the number of parameters is reduced thus the


computations also decreased.
Why convnets are better than feed-forward neural
nets?
• Because of the dimensionality reduction in CNN, the
computational power needed is reduced.

• Consider an input image used in ConvNets and Fee-forward


neural network as shown below:
Why convnets are better than feed-forward neural
nets?
• Each and every pixel of the image will have a different weight
associated to it and it will have three values(rgb values)
associated to it.

• If we can apply feed forward networks to color image say of size


227*227 as input then the number of parameters become
227*227*3.

• Roughly, 10⁴ number of weights will be associated with the


image. So, 10⁴ number of neurons will be required in one single
layer of the network which is really incompatible and complex to
work.
Why convnets are better than feed-forward neural
nets?
• Hence, millions of parameters and neurons will be
required in one single feed forward network, so they are
incompatible for handling many number of images.

• In CNN’s a kernel is built (kernel is basically a matrix of


weights) and the weights are shared as the kernel moves
horizontally and vertically across and image.

• The Maxpooling operation directly cuts the number of


parameters by half.

• Further, the concept of padding and stride which further


decreases the parameter size of the image.
CONVOLUTIONAL OPERATION
• Convolution is a specialized kind of linear operation.

• Convolution is an operation on two functions of a real- valued


argument.

• Convnets are simply neural networks that use convolution in


place of general matrix multiplication in at least one of their
layers.

• Convolution between two functions uses mathematical


operation that produces a third function expressing how the
shape of one function is modified by other.

• Convolution operation uses different types of kernel operations


on the input image to extract different features.
CONVOLUTIONAL OPERATION
• Convolution Kernels types
• A kernel is a small 2D matrix whose contents are based
upon the operations to be performed.
• A kernel maps on the input image by simple matrix
multiplication and addition, the output obtained is of
lower dimensions and therefore easier to work with
images.
• Some kernel types are shown in the Figure below:
CONVOLUTIONAL OPERATION

Figure : Kernel types

• In the Figure, Gaussian blur is applied on the original image and the
result is shown.

• After applying kernel Gaussian blur the image becomes smooth.


CONVOLUTIONAL OPERATION
• The second and third kernel shown the operation of
Sharpen image(enhance the depth of edges) and edge
detection.
• The shape of a kernel is heavily dependent on the input
shape of the image and architecture of the entire network,
mostly the size of kernels is (MxM) i.e a square matrix.
• The movement of a kernel is always from left to right and
top to bottom as shown in the Figure below.
CONVOLUTIONAL OPERATION
• Stride defines by what step does the kernel to move, for
example stride of 1 makes kernel slide by one row/column
at a time and stride of 2 moves kernel by 2 rows/columns.

• Figure below shows the operation of stride

Figure : Multiple kernels with movement of stride=1


CONVOLUTIONAL OPERATION
• Consider a simple kernel operation with stride = 1.

f
CONVOLUTIONAL OPERATION
• Here the input matrix has shape 4x4x1 and the kernel is of
size 3x3, since the shape of input is larger than the kernel.

• we are able to implement a sliding window protocol and


apply the kernel over entire input.

• First entry in the convoluted result is calculated as:

• 45*0 + 12*(-1) + 5*0 + 22*(-1) + 10*5 + 35*(-1) + 88*0 +


26*(-1) + 51*0 = -45

• The above process is repeated till the entire input has been
processed.

You might also like