Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views38 pages

DLA Unit 4

This document provides an overview of deep learning concepts, focusing on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). It explains the workings, applications, advantages, and disadvantages of CNNs and RNNs, including their architectures and specific models like Long Short-Term Memory (LSTM) and Sequence-to-Sequence models. The text also discusses the importance of these neural networks in handling visual and sequential data, along with their applications in various fields such as computer vision and natural language processing.

Uploaded by

terala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views38 pages

DLA Unit 4

This document provides an overview of deep learning concepts, focusing on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). It explains the workings, applications, advantages, and disadvantages of CNNs and RNNs, including their architectures and specific models like Long Short-Term Memory (LSTM) and Sequence-to-Sequence models. The text also discusses the importance of these neural networks in handling visual and sequential data, along with their applications in various fields such as computer vision and natural language processing.

Uploaded by

terala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

DEEP LEARNING AND APPLICATIONS

MR20-1CS0158
UNIT IV

Prepared by
Dr.M.Narayanan
Professor
Department of CSE
Malla Reddy University, Hyderabad
DEEP LEARNING AND APPLICATIONS
MR20-1CS0158
UNIT–IV:
Convolutional Neural Networks: Introduction to CNNs, Kernel filter, Principles behind
CNNs, Multiple Filters, CNN applications
Recurrent Neural Networks: Introduction to RNNs, Unfolded RNNs, Seq2Seq RNNs,
LSTM, RNN applications.
Text Book
1. Goodfellow, I., Bengio, Y., and Courville, A., Deep Learning, MIT Press, 2016
What is a Convolutional Neural Networks? How does it work? And also explain Pooling
Layer.
 Deep Learning has proved to be a very powerful tool because of its ability to handle large
amounts of data. The interest to use hidden layers has surpassed traditional techniques,
especially in pattern recognition.
 One of the most popular deep neural networks is Convolutional Neural Networks (also
known as CNN or ConvNet) in deep learning, especially when it comes to Computer
Vision applications.
 Since the 1950s, the early days of AI, researchers have struggled to make a system that can
understand visual data. In the following years, this field came to be known as Computer
Vision. In 2012, computer vision took a quantum leap when a group of researchers from
the University of Toronto developed an AI model that surpassed the best image recognition
algorithms, and that too by a large margin.

 The AI system, which became known as AlexNet (named after its main creator, Alex
Krizhevsky), won the 2012 ImageNet computer vision contest with an amazing 85 percent
accuracy. The runner-up scored a modest 74 percent on the test.

 At the heart of AlexNet was Convolutional Neural Networks a special type of neural
network that roughly imitates human vision. Over the years CNNs have become a very
important part of many Computer Vision applications and hence a part of any computer
vision course online. So let’s take a look at the workings of CNNs or CNN algorithm in
deep learning.
 CNN’s were first developed and used around the 1980s. The most that a CNN could do at
that time was recognize handwritten digits. It was mostly used in the postal sectors to read
zip codes, pin codes, etc.
 The important thing to remember about any deep learning model is that it requires a large
amount of data to train and also requires a lot of computing resources.
 This was a major drawback for CNNs at that period and hence CNNs were only limited to
the postal sectors and it failed to enter the world of machine learning.
 In 2012 Alex Krizhevsky realized that it was time to bring back the branch of deep
learning that uses multi-layered neural networks.
 The availability of large sets of data, to be more specific ImageNet datasets with millions
of labeled images and an abundance of computing resources enabled researchers to revive
CNNs.
What Is a CNN?
 In deep learning, a convolutional neural network (CNN/ConvNet) is a class of deep neural
networks, most commonly applied to analyze visual imagery.
 Now when we think of a neural network we think about matrix multiplications but that is
not the case with ConvNet.
 It uses a special technique called Convolution. Now in mathematics convolution is a
mathematical operation on two functions that produces a third function that expresses how
the shape of one is modified by the other.
 But we don’t really need to go behind the mathematics part to understand what a CNN is
or how it works.
 Bottom line is that the role of the ConvNet is to reduce the images into a form that is easier
to process, without losing features that are critical for getting a good prediction.
How does it work?
 Before we go to the working of CNN’s let’s cover the basics such as what is an image and
how is it represented.
 An RGB image is nothing but a matrix of pixel values having three planes whereas a
grayscale image is the same but it has a single plane.
 Take a look at this image to understand more.
 For simplicity, let’s stick with grayscale images as we try to understand how CNNs work.
 The Below image shows what a convolution is. We take a filter/kernel(3×3 matrix) and
apply it to the input image to get the convolved feature. This convolved feature is passed
on to the next layer.
 In the case of RGB color, channel take a look at this animation to understand its working
 Convolutional neural networks are composed of multiple layers of artificial neurons.
Artificial neurons, a rough imitation of their biological counterparts, are mathematical
functions that calculate the weighted sum of multiple inputs and outputs an activation
value.
 When you input an image in a ConvNet, each layer generates several activation functions
that are passed on to the next layer.
 The first layer usually extracts basic features such as horizontal or diagonal edges.
 This output is passed on to the next layer which detects more complex features such as
corners or combinational edges.
 As we move deeper into the network it can identify even more complex features such as
objects, faces, etc.
 Based on the activation map of the
final convolution layer, the
classification layer outputs a set of
confidence scores (values between
0 and 1) that specify how likely the
image is to belong to a “class.”
 For instance, if you have a
ConvNet that detects cats, dogs, and
horses, the output of the final layer
is the possibility that the input
image contains any of those
animals.
What Is a Pooling Layer?
 Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial
size of the Convolved Feature.
 This is to decrease the computational power required to process the data by reducing the
dimensions.
 There are two types of pooling average pooling and max pooling. I’ve only had experience
with Max Pooling so far I haven’t faced any difficulties.
 So what we do in Max Pooling is we find the maximum value of a pixel from a portion of
the image covered by the kernel.
 Max Pooling also performs as a Noise Suppressant.
 It discards the noisy activations altogether and also performs de-noising along with
dimensionality reduction.
 On the other hand, Average Pooling returns the average of all the values from the portion
of the image covered by the Kernel.
 Average Pooling simply performs dimensionality reduction as a noise suppressing
mechanism.
 Hence, we can say that Max Pooling performs a lot better than Average Pooling.
What is a Recurrent Neural Network (RNN)? How does Recurrent Neural Networks
work?
 A Deep Learning approach for modelling sequential data is Recurrent Neural Networks
(RNN). RNNs were the standard suggestion for working with sequential data before the
advent of attention models.
 Specific parameters for each element of the sequence may be required by a deep feed
forward model. It may also be unable to generalize to variable-length sequences.
 Recurrent Neural Networks use the same weights for each element of the sequence,
decreasing the number of parameters and allowing the model to generalize to sequences of
varying lengths.
 RNNs generalize to structured data other than sequential data, such as geographical or
graphical data, because of its design.
 Recurrent neural networks, like many other deep learning techniques, are relatively old.
They were first developed in the 1980s, but we didn’t appreciate their full potential until
lately.
 The advent of long short-term memory (LSTM) in the 1990s, combined with an increase in
computational power and the vast amounts of data that we now have to deal with, has
really pushed RNNs to the forefront.
What is a Recurrent Neural Network (RNN)?
 Neural networks imitate the function of the human brain in the fields of AI, machine
learning, and deep learning, allowing computer programs to recognize patterns and solve
common issues
 RNNs are a type of neural network that can be used to model sequence data. RNNs, which
are formed from feedforward networks, are similar to human brains in their behaviour.
 Simply said, recurrent neural networks can anticipate sequential data in a way that other
algorithms can’t.
 All of the inputs and outputs in standard neural networks are independent of one another,
however in some circumstances, such as when predicting the next word of a phrase, the
prior words are necessary, and so the previous words must be remembered.
 As a result, RNN was created, which used a Hidden Layer to overcome the problem. The
most important component of RNN is the Hidden state, which remembers specific
information about a sequence.
 RNNs have a Memory that stores all information about the calculations.
 It employs the same settings for each input since it produces the same outcome by
performing the same task on all inputs or hidden layers.
How does Recurrent Neural Networks work?
 The information in recurrent neural networks cycles through a loop to the middle hidden
layer.
 The input layer x receives and processes the neural network’s input before passing it on to
the middle layer.
 Multiple hidden layers can be found in the middle layer h, each with its own activation
functions, weights, and biases.
 You can utilize a recurrent neural network if the various parameters of different hidden
layers are not impacted by the preceding layer, i.e. There is no memory in the neural
network.
 The different activation functions, weights, and biases will be standardized by the
Recurrent Neural Network, ensuring that each hidden layer has the same characteristics.
 Rather than constructing numerous hidden layers, it will create only one and loop over it as
many times as necessary.
Common Activation Functions
 A neuron’s activation function dictates whether it should be turned on or off. Nonlinear
functions usually transform a neuron’s output to a number between 0 and 1 or -1 and 1
The following are some of the most commonly utilized functions:
 Sigmoid: The formula g(z) = 1/(1 + e^-z) is used to express this.
 Tanh: The formula g(z) = (e^-z – e^-z)/(e^-z + e^-z) is used to express this.
 Relu: The formula g(z) = max(0 , z) is used to express this.

Advantages and disadvantages of RNN


Advantages of RNNs:
 Handle sequential data effectively, including text, speech, and time series.
 Process inputs of any length, unlike feedforward neural networks.
 Share weights across time steps, enhancing training efficiency.
Disadvantages of RNNs:
 Prone to vanishing and exploding gradient problems, hindering learning.
 Training can be challenging, especially for long sequences.
 Computationally slower than other neural network architectures.
Explain Sequence to Sequence Models
 Google first released sequence-to-sequence models for machine translation. Before it, the
translation operated in an extremely ignorant manner.
 It was automatically translated into its intended language without considering the syntax or
sentence structure of each word you typed.
 Sequence-to-sequence models used deep learning to transform the translation process.
When translating, it considers the current word or input and its surroundings.
 It is employed in many applications, including text summarization, conversational
modeling, and image captioning.
Introduction to sequence-to-sequence Model
 The many-to-many architecture of an RNN is what the sequence-to-sequence model
(seq2seq) uses. Its ability to transfer arbitrary-length input sequences to arbitrary-length
output sequences has made it useful for various applications.
 Some of the uses are Language translation, music creation, speech creation, and chatbots
are all included in the sequence-to-sequence models concept.
 In most cases, the length of the input and output differ. For example, if we take a
translation task. Let's assume we need to convert a sentence from English to French.
 Consider the sentence "I am doing good" to be mapped to "Je vais bien". We can see that
the input feed has four words, but where in the output, we would see only three words.
Hence, this algorithm handles these scenarios with varying input and output sequence
lengths.
 The architecture of the sequence-to-sequence model comprises two components:
 Encoder
 Decoder
 The Encoder learns the embeddings of the input sentence. Embeddings are vectors
comprising the meaning of the sentence. Then the Decoder takes the embedding vectors as
input and tries constructing the target sentence.
 In simple words, in a translation task, the Encoder takes the English sentence as input,
learns the embedding from it, and feeds the embeddings to the decider. The Decoder
generates the target French sentence using embedding fed.
 Sequence models are the machine learning models that input or output sequences of data.
Sequential data includes text streams, audio clips, video clips, time-series data and etc.
Recurrent Neural Networks (RNNs) is a popular algorithm used in sequence models.
Applications of Sequence Models
 Speech recognition: In speech recognition, an audio clip is given as an input and then the
model has to generate its text transcript. Here both the input and output are sequences of
data.
 Sentiment Classification: In sentiment classification opinions expressed in a piece of text
is categorized. Here the input is a sequence of words.
 Video Activity Recognition: In video activity recognition, the model needs to identify the
activity in a video clip. A video clip is a sequence of video frames, therefore in case of
video activity recognition input is a sequence of data.
Use Cases of the Sequence to Sequence Models
 Many of the technologies you use every day are based on sequence-to-sequence models.
For instance, voice-activated gadgets, online chatbots, and services like Google Translate
are all powered by the sequence-to-sequence model architecture. Among the applications
are the following:
 Machine translation - The seq2seq model predicts a word from the user's input, after
which each subsequent word is predicted using the likelihood that the first word will
appear.
 Video captioning - To understand the temporal structure of the sequence of frames and the
sequence model of the generated sentences with RNNs, a sequence-to-sequence model for
Video captioning was developed.
 Text summarization - Using neural sequence-to-sequence models, an effective novel
method for abstractive text summarization has been made available (not restricted to
selecting and rearranging passages from the original text)
What is LSTM? Explain LSTM Architecture
 Long Short-Term Memory Networks is a deep learning, sequential neural network that
allows information to persist. It is a special type of Recurrent Neural Network which is
capable of handling the vanishing gradient problem faced by RNN.
 LSTM was designed by Hochreiter and Schmidhuber that resolves the problem caused by
traditional rnns and machine learning algorithms. LSTM can be implemented in Python
using the Keras library.
 Let’s say while watching a video, you remember the previous scene, or while reading a
book, you know what happened in the earlier chapter.
 RNNs work similarly; they remember the previous information and use it for processing
the current input.
 The shortcoming of RNN is they cannot remember long-term dependencies due to
vanishing gradient.
 LSTMs are explicitly designed to avoid long-term dependency problems.
What is LSTM?
 LSTM (Long Short-Term Memory) is a recurrent neural network (RNN) architecture
widely used in Deep Learning.
 It excels at capturing long-term dependencies, making it ideal for sequence prediction
tasks.
 Unlike traditional neural networks, LSTM incorporates feedback connections, allowing it
to process entire sequences of data, not just individual data points.
 This makes it highly effective in understanding and predicting patterns in sequential data
like time series, text, and speech.
LSTM Architecture
 In the introduction to long short-term memory, we learned that it resolves the vanishing
gradient problem faced by RNN, so now, in this section, we will see how it resolves this
problem by learning the architecture of the LSTM.
 At a high level, LSTM works very much like an RNN cell. Here is the internal functioning
of the LSTM network. The LSTM network architecture consists of three parts, as shown in
the image below, and each part performs an individual function.
The Logic Behind LSTM
 The first part chooses whether the information coming from the previous timestamp is to
be remembered or is irrelevant and can be forgotten.
 In the second part, the cell tries to learn new information from the input to this cell.
 At last, in the third part, the cell passes the updated information from the current
timestamp to the next timestamp. This one cycle of LSTM is considered a single-time step.
 These three parts of an LSTM unit are known as gates. They control the flow of
information in and out of the memory cell or lstm cell. The first gate is called Forget gate,
the second gate is known as the Input gate, and the last one is the Output gate.
 An LSTM unit that consists of these three gates and a memory cell or lstm cell can be
considered as a layer of neurons in traditional feedforward neural network, with each
neuron having a hidden layer and a current state.
 Just like a simple RNN, an LSTM also has a hidden state where H(t-1) represents the
hidden state of the previous timestamp and Ht is the hidden state of the current timestamp.
In addition to that, LSTM also has a cell state represented by C(t-1) and C(t) for the
previous and current timestamps, respectively.
 Here the hidden state is known as Short term memory, and the cell state is known as Long
term memory. Refer to the following image.

 It is interesting to note that the cell state carries the information along with all the
timestamps.
We
Learning gives Creativity,
Creativity leads to Thinking,
Thinking provides Knowledge,
and Knowledge makes you
great
A. P. J. Abdul Kalam

37

You might also like