Artificial
Intelligence
Recurrent Neural Networks
Copyright IntelliPaat. All rights reserved
Agenda
Issues with Feed Forward Understanding Recurrent Neural
01 Network 02 Networks
03 Types of RNN 04 Issues with RNN
05 Vanishing Gradient Problem 06 Long Short Term Networks
07 Demo on LSTM with Keras
Copyright Intellipaat. All rights reserved.
Issues with Feed Forward Network
Outputs are independent of each other
No Relation
Cannot handle sequential data
Output at ‘t’ Output at ‘t+1’
Cannot memorize previous inputs
Feed Forward Network
Copyright IntelliPaat. All rights reserved
Issues with Feed Forward Network
Would this feed
forward network be
able to predict the
Input Output
next word?
Recurrent Neural ……………………. FFN
This feed forward network
wouldn’t be able to predict
the next word because it
cannot memorize the
previous inputs
Copyright IntelliPaat. All rights reserved
Solution with Recurrent Neural Network
I only cook these
three items and
in the same
sequence
Day 1 Day 2 Day 3
Copyright IntelliPaat. All rights reserved
Solution with Recurrent Neural Network
Outputs are dependent on each other
Can handle sequential data
Day 1 Day 2 Day 3
Can memorize previous inputs
Recurrent Neural
Network
Copyright IntelliPaat. All rights reserved
Understanding Recurrent Neural Networks
▪ RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous
computations
▪ Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far
Input at Input at Input at
Input
‘t-1’ ‘t’ ‘t+1’
Copyright IntelliPaat. All rights reserved
Understanding Recurrent Neural Networks
• Xt is the input at time step ‘t’
• St is the hidden state at time step ‘t’. It’s the memory of the network. St is calculated based on the previous hidden
state and the input at the current step: 𝑠𝑡 = 𝑓 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1 . The function 𝑓 is usually a non-linearity such as tanh
or ReLu.
• Ot is the output at step ‘t’. 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡
Copyright IntelliPaat. All rights reserved
Back-Propagation through Time
▪ Backpropagation Through Time (BPTT) is used to update the weights in the
recurrent neural network
▪ RNN typically predicts one output per each time step. Conceptually,
Backpropagation through Time works by unrolling the network to get each of
these individual time steps.
▪ Then, it calculates the error across each time step and adds up all of the
individual errors to get the final accumulated error.
▪ Following which the network is rolled back up and the weights are updated
Copyright IntelliPaat. All rights reserved
Types of RNN
single images ( or words,... ) are
single images ( or words,... ) are
classified in single class ( binary
classified in multiple classes
classification ) i.e. is this a bird or not
Copyright IntelliPaat. All rights reserved
Types of RNN
sequence of images ( or words, ... )
sequence of images ( or words, ... )
is classified in single class ( binary
is classified in multiple classes
classification of a sequence )
Copyright IntelliPaat. All rights reserved
Issues with RNN
Suppose we try to
predict the last word
in this text.. Input Output
Recurrent Neural …… RNN Network
Here, the RNN does not
need any further context. It
can easily predict that the
last word would be
‘Network’
Copyright IntelliPaat. All rights reserved
Issues with RNN
Now, let’s predict the
last word in this text..
Input Output
I’ve been staying in Spain for the last 10
years. I can speak fluent ………….. RNN
Regular RNN’s have
difficulty in learning long
range dependencies
Copyright IntelliPaat. All rights reserved
Issues with RNN
I’ve been staying in Spain for the last 10 years. I can speak fluent …………..
• In this case, the network needs the context of ‘Spain’ to predict the last word in this text, which is “Spanish”
• The gap between the word which we want to predict and the relevant information is very large and this is known as
long term dependency
∂E/∂W = ∂E/∂ 3 *∂ 3/∂h3 *∂h3/∂ 2 *∂ 2/∂h1…
• There arises a long dependency while backpropagating the error
Copyright IntelliPaat. All rights reserved
Vanishing Gradient Problem
▪ Now, if there is a really long dependency, there’s a good probability that one
of the gradients might approach zero and this would lead to all the gradients
rushing to zero exponentially fast due to multiplication
∂E/∂W=0
▪ Such states would no longer help the network to learn anything. This is
known as vanishing gradient problem
Copyright IntelliPaat. All rights reserved
Long Short Term Networks
Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem
Standard RNN
All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will
have a very simple structure, such as a single tanh layer
Copyright IntelliPaat. All rights reserved
Long Short Term Networks
Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem
h1 h
h 1
LSTM
h
1 1
LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer,
there are four, interacting in a very special way
Copyright IntelliPaat. All rights reserved
Core Idea behind LSTMs
The key to LSTMs is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire
chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged
h
h 1 h
The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures
called gates
Copyright IntelliPaat. All rights reserved
Core Idea behind LSTMs
Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and
a pointwise multiplication operation
The sigmoid layer outputs numbers between zero and one, describing how much of each component should be
let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”
Copyright IntelliPaat. All rights reserved
Working of LSTMs
Step 1
The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This
decision is made by a sigmoid layer called the “forget gate layer”
Copyright IntelliPaat. All rights reserved
Working of LSTMs
Step 2
The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a
sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of
new candidate values, that could be added to the state
Copyright IntelliPaat. All rights reserved
Working of LSTMs
Step 3
Then we have to update the old cell state, Ct-1, into new cell state Ct. So, we multiply the old state (Ct-1) by ft,
forgetting the things we decided to forget earlier. Then we add (it * C~t). This is the new candidate values, scaled
by how much we decided to update each state value
h
h 1 h
Copyright IntelliPaat. All rights reserved
Working of LSTMs
Step 4
Finally, we’ll run a sigmoid layer which decides what part of the cell state we’re going to output. Then, we put the
cell state through tanh and multiply it by the output of the sigmoid gate, so that we only output the parts we
decided to
h
h 1
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Loading the required packages:
Preparing the input data:
Creating 100 vectors with 5 consecutive
numbers
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Preparing the output data:
Converting the data & target into numpy arrays:
Having a glance at the shape:
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Dividing the data into train & test sets:
Creating a sequential model:
Adding the LSTM layer with the output and input shape:
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Compiling the model with ‘Adam’ optimizer:
Having a glance at the model summary:
10
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Fitting a model on the train set:
11
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Predicting the values on the test set:
12
Making a scatter plot for actual values and predicted values:
13
Copyright IntelliPaat. All rights reserved
We see that the model
fails miserably and none
of the predictions are
correct
Copyright IntelliPaat. All rights reserved
We’d have to normalize
the data before we
build the model
Normalized
Raw Data Normalizing
Data
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Normalizing the input data:
14
Normalizing the output data:
15
Fitting the model with normalized values and number of epochs to be 500:
16
Copyright IntelliPaat. All rights reserved
Implementing a Simple RNN
Predicting the values on test set:
17
Making a scatter plot for actual values & predicted values:
18
Copyright IntelliPaat. All rights reserved
We see that the loss has
reduced after
normalizing the data
and increasing the
epochs
Copyright IntelliPaat. All rights reserved
Quiz
Copyright IntelliPaat. All rights reserved
Quiz 1
Gated Recurrent units can help prevent
vanishing gradient problem in RNN.
A True
B False
Copyright IntelliPaat. All rights reserved
Answer 1
Gated Recurrent units can help prevent
vanishing gradient problem in RNN.
A True
B False
Copyright IntelliPaat. All rights reserved
Quiz 2
How many types of RNN exist?
A 4
B 2
C 3
D None of these
Copyright IntelliPaat. All rights reserved
Answer 2
How many types of RNN exist?
A 4
B 2
C 3
D None of these
Copyright IntelliPaat. All rights reserved
Quiz 3
How many gates are there in LSTM?
A 1
B 2
C 3
D 4
Copyright IntelliPaat. All rights reserved
Answer 3
How many gates are there in LSTM?
A 1
B 2
C 3
D 4
Copyright IntelliPaat. All rights reserved
India: +91-7847955955
US: 1-800-216-8930 (TOLL FREE)
[email protected]
24/7 Chat with Our Course Advisor
Copyright IntelliPaat. All rights reserved