0% found this document useful (0 votes)

9 views42 pages

Module 6

The document discusses recurrent neural networks and their ability to handle sequential data. It describes issues with standard feedforward networks and recurrent networks, such as the vanishing gradient problem. It also introduces long short-term memory networks which are designed to address long-term dependencies.

Uploaded by

Ashin Sarkar Lahiri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views42 pages

Module 6

Uploaded by

Ashin Sarkar Lahiri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Artificial

Intelligence
Recurrent Neural Networks

Copyright IntelliPaat. All rights reserved

Agenda
Issues with Feed Forward Understanding Recurrent Neural
01 Network 02 Networks

03 Types of RNN 04 Issues with RNN

05 Vanishing Gradient Problem 06 Long Short Term Networks

07 Demo on LSTM with Keras

Copyright Intellipaat. All rights reserved.

Issues with Feed Forward Network

Outputs are independent of each other

No Relation

Cannot handle sequential data

Output at ‘t’ Output at ‘t+1’

Cannot memorize previous inputs

Feed Forward Network

Copyright IntelliPaat. All rights reserved

Issues with Feed Forward Network

Would this feed

forward network be
able to predict the
Input Output
next word?

Recurrent Neural ……………………. FFN

This feed forward network

wouldn’t be able to predict
the next word because it
cannot memorize the
previous inputs

Copyright IntelliPaat. All rights reserved

Solution with Recurrent Neural Network

I only cook these

three items and
in the same
sequence

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved

Solution with Recurrent Neural Network

Outputs are dependent on each other

Can handle sequential data

Day 1 Day 2 Day 3

Can memorize previous inputs

Recurrent Neural
Network

Copyright IntelliPaat. All rights reserved

Understanding Recurrent Neural Networks

▪ RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous

computations

▪ Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far

Input at Input at Input at

Input
‘t-1’ ‘t’ ‘t+1’

Copyright IntelliPaat. All rights reserved

Understanding Recurrent Neural Networks

• Xt is the input at time step ‘t’

• St is the hidden state at time step ‘t’. It’s the memory of the network. St is calculated based on the previous hidden
state and the input at the current step: 𝑠𝑡 = 𝑓 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1 . The function 𝑓 is usually a non-linearity such as tanh
or ReLu.
• Ot is the output at step ‘t’. 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡

Copyright IntelliPaat. All rights reserved

Back-Propagation through Time

▪ Backpropagation Through Time (BPTT) is used to update the weights in the

recurrent neural network

▪ RNN typically predicts one output per each time step. Conceptually,

Backpropagation through Time works by unrolling the network to get each of

these individual time steps.

▪ Then, it calculates the error across each time step and adds up all of the

individual errors to get the final accumulated error.

▪ Following which the network is rolled back up and the weights are updated

Copyright IntelliPaat. All rights reserved

Types of RNN

single images ( or words,... ) are

single images ( or words,... ) are
classified in single class ( binary
classified in multiple classes
classification ) i.e. is this a bird or not

Copyright IntelliPaat. All rights reserved

Types of RNN

sequence of images ( or words, ... )

sequence of images ( or words, ... )
is classified in single class ( binary
is classified in multiple classes
classification of a sequence )

Copyright IntelliPaat. All rights reserved

Issues with RNN

Suppose we try to
predict the last word
in this text.. Input Output

Recurrent Neural …… RNN Network

Here, the RNN does not

need any further context. It
can easily predict that the
last word would be
‘Network’

Copyright IntelliPaat. All rights reserved

Issues with RNN

Now, let’s predict the

last word in this text..

Input Output

I’ve been staying in Spain for the last 10

years. I can speak fluent ………….. RNN

Regular RNN’s have

difficulty in learning long
range dependencies

Copyright IntelliPaat. All rights reserved

Issues with RNN

I’ve been staying in Spain for the last 10 years. I can speak fluent …………..

• In this case, the network needs the context of ‘Spain’ to predict the last word in this text, which is “Spanish”

• The gap between the word which we want to predict and the relevant information is very large and this is known as
long term dependency

∂E/∂W = ∂E/∂ 3 ∂ 3/∂h3 ∂h3/∂ 2 *∂ 2/∂h1…

• There arises a long dependency while backpropagating the error

Copyright IntelliPaat. All rights reserved

Vanishing Gradient Problem

▪ Now, if there is a really long dependency, there’s a good probability that one

of the gradients might approach zero and this would lead to all the gradients

rushing to zero exponentially fast due to multiplication

∂E/∂W=0

▪ Such states would no longer help the network to learn anything. This is

known as vanishing gradient problem

Copyright IntelliPaat. All rights reserved

Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

Standard RNN

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will

have a very simple structure, such as a single tanh layer

Copyright IntelliPaat. All rights reserved

Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

h1 h
h 1

LSTM
h

1 1

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer,

there are four, interacting in a very special way

Copyright IntelliPaat. All rights reserved

Core Idea behind LSTMs

The key to LSTMs is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire
chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged

h
h 1 h

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures
called gates

Copyright IntelliPaat. All rights reserved

Core Idea behind LSTMs

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and
a pointwise multiplication operation

The sigmoid layer outputs numbers between zero and one, describing how much of each component should be
let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”

Copyright IntelliPaat. All rights reserved

Working of LSTMs

Step 1

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This
decision is made by a sigmoid layer called the “forget gate layer”

Normalized
Raw Data Normalizing
Data

Implementing a Simple RNN

Normalizing the input data:

Normalizing the output data:

Fitting the model with normalized values and number of epochs to be 500:

Implementing a Simple RNN

Predicting the values on test set:

Making a scatter plot for actual values & predicted values:

We see that the loss has
reduced after
normalizing the data
and increasing the
epochs

Quiz

Quiz 1

Gated Recurrent units can help prevent

vanishing gradient problem in RNN.

A True

B False

Answer 1

[email protected]

24/7 Chat with Our Course Advisor

Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
Slides RNN
No ratings yet
Slides RNN
75 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
15 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Stock Prediction with RNN and LSTM
0% (1)
Stock Prediction with RNN and LSTM
24 pages
Module 6
No ratings yet
Module 6
51 pages
Advanced Deep Learning with RNNs
No ratings yet
Advanced Deep Learning with RNNs
50 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
03 AIbDS II LSTM
No ratings yet
03 AIbDS II LSTM
34 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
Deep Learning Subject Practicals Uni Mumbai
No ratings yet
Deep Learning Subject Practicals Uni Mumbai
13 pages
cs224n spr2024 Lecture06 Fancy RNN
No ratings yet
cs224n spr2024 Lecture06 Fancy RNN
56 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
Lec 10
No ratings yet
Lec 10
37 pages
RNNs: Understanding and Applications
No ratings yet
RNNs: Understanding and Applications
30 pages
Final PDL - Unit IV
No ratings yet
Final PDL - Unit IV
51 pages
RNN 2
No ratings yet
RNN 2
144 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
Session2 2024 - 2025 - Natural Language Processing
No ratings yet
Session2 2024 - 2025 - Natural Language Processing
30 pages
RNNs
No ratings yet
RNNs
22 pages
RNN LSTM
No ratings yet
RNN LSTM
37 pages
DS303 RNN LSTM
No ratings yet
DS303 RNN LSTM
16 pages
LSTM
No ratings yet
LSTM
22 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Unit IV
No ratings yet
Unit IV
22 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
15.03.2024 Csa3007 A24+d23+d24
No ratings yet
15.03.2024 Csa3007 A24+d23+d24
8 pages
Recurrent Neural Network: What Does RNN Stand For?
No ratings yet
Recurrent Neural Network: What Does RNN Stand For?
7 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
MachineLearningSlides PartTwo
No ratings yet
MachineLearningSlides PartTwo
141 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
LSTM Networks for AI Enthusiasts
No ratings yet
LSTM Networks for AI Enthusiasts
8 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Survey of Prediction Using Recurrent Neural Network
No ratings yet
Survey of Prediction Using Recurrent Neural Network
3 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
21 pages

Module 6

Uploaded by

Module 6

Uploaded by

Artificial

Copyright IntelliPaat. All rights reserved

03 Types of RNN 04 Issues with RNN

05 Vanishing Gradient Problem 06 Long Short Term Networks

07 Demo on LSTM with Keras

Copyright Intellipaat. All rights reserved.

Outputs are independent of each other

Cannot handle sequential data

Cannot memorize previous inputs

Copyright IntelliPaat. All rights reserved

Would this feed

Recurrent Neural ……………………. FFN

This feed forward network

Copyright IntelliPaat. All rights reserved

I only cook these

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved

Outputs are dependent on each other

Can handle sequential data

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved

Input at Input at Input at

Copyright IntelliPaat. All rights reserved

• Xt is the input at time step ‘t’

Copyright IntelliPaat. All rights reserved

▪ Backpropagation Through Time (BPTT) is used to update the weights in the

recurrent neural network

Backpropagation through Time works by unrolling the network to get each of

these individual time steps.

individual errors to get the final accumulated error.

Copyright IntelliPaat. All rights reserved

single images ( or words,... ) are

Copyright IntelliPaat. All rights reserved

sequence of images ( or words, ... )

Copyright IntelliPaat. All rights reserved

Recurrent Neural …… RNN Network

Here, the RNN does not

Copyright IntelliPaat. All rights reserved

Now, let’s predict the

I’ve been staying in Spain for the last 10

Regular RNN’s have

Copyright IntelliPaat. All rights reserved

∂E/∂W = ∂E/∂ 3 *∂ 3/∂h3 *∂h3/∂ 2 *∂ 2/∂h1…

• There arises a long dependency while backpropagating the error

Copyright IntelliPaat. All rights reserved

rushing to zero exponentially fast due to multiplication

known as vanishing gradient problem

Copyright IntelliPaat. All rights reserved

have a very simple structure, such as a single tanh layer

Copyright IntelliPaat. All rights reserved

there are four, interacting in a very special way

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Preparing the input data:

Creating 100 vectors with 5 consecutive

Copyright IntelliPaat. All rights reserved

Preparing the output data:

Converting the data & target into numpy arrays:

Having a glance at the shape:

Copyright IntelliPaat. All rights reserved

Dividing the data into train & test sets:

Creating a sequential model:

Copyright IntelliPaat. All rights reserved

Compiling the model with ‘Adam’ optimizer:

Having a glance at the model summary:

Copyright IntelliPaat. All rights reserved

Fitting a model on the train set:

Copyright IntelliPaat. All rights reserved

Predicting the values on the test set:

Making a scatter plot for actual values and predicted values:

Copyright IntelliPaat. All rights reserved

∂E/∂W = ∂E/∂ 3 ∂ 3/∂h3 ∂h3/∂ 2 *∂ 2/∂h1…