0% found this document useful (0 votes)

28 views25 pages

Unit 3 RCNN

Recurrent Neural Networks (RNNs) are a type of neural network designed for sequential data, where outputs from previous steps are fed as inputs to current steps, making them suitable for tasks like natural language processing. They face challenges such as vanishing and exploding gradients but have various architectures like LSTMs and GRUs to address these issues. RNNs are widely used in applications such as text summarization, music generation, and sentiment analysis due to their ability to handle inputs of varying lengths and remember historical information.

Uploaded by

kannan.niran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views25 pages

Unit 3 RCNN

Uploaded by

kannan.niran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Recurrent Neural networks

What is a Recurrent Neural Network (RNN)?

 Recurrent neural network is a type of neural network in which the output form
the previous step is fed as input to the current step

 In traditional neural networks, all the inputs and outputs are independent of
each other, but this is not a good idea if we want to predict the next word in a
sentence
 We need to remember the previous word in order to generate the next word in a
sentence, hence traditional neural networks are not efficient for NLP
applications
 RNNs also have a hidden stage which used to capture information about a
sentence
 RNNs have a ‘memory’, which is used to capture information about the
calculations made so far
 In theory, RNNs can use information in arbitrary long sequences, but
practically they are limited to look back only a few steps

Diagrammatic Representation
Unfolding means writing the network for the complete sequence, for example, if a sequence
has 4 words then the network will be unfolded into a 4 layered neural network

We can think of s t as the memory of the network as it captures information about what
happened in all the previous steps

A traditional neural network uses different parameter at each layer while an RNN shares the
same parameter across all the layers, in the diagram we could see that the same parameters
(U, V, W) were being used across all the layers

Using the same parameters across all layers shown that we are performing the same task with
different inputs, thus reducing the total number of parameters to learn

The tree set of parameter ( U, V, and W) are used to apply linear transformation over their
respective inputs

Parameter U transformation the input xt to the state st

Parameter W transforms the previous state st-1 to the current state st

And, parameter V maps the computed internal state st to the output Ot

Formula to calculate current state:

ht = f(ht-1,xt)
Here, ht is the current state, ht-1 is the previous state and xt is the current input

The equation applying after activation function (tanh) is:

ht=tanh(whhht-1 + wxhxt)
Here, whh : weight at recurrent neuron, Wxh : weight at input neuron

 After calculating the final state, we can then produce the output
 The output state can be calculated as:

Ot = Why ht
Here, Ot is the output state, why: weight at output layer, ht: current state
Backward propagation in RNN
Backward phase :

To train an RNN, we need a loss function. We will make use of cross-entropy loss which is
often paired with softmax, which can be calculated as:

L = -ln(pc)
Here, pc is the RNN’s predicted probability for the correct class (positive or negative). For
example, if a positive text is predicted to be 95% positive by the RNN, then the loss is:
L= -ln(0.95) = 0.051
After calculating loss, we will train the RNN using gradient descent to minimize loss
Steps for Back Propagation

 We compute the cross-entropy error first using the current and actual output. The
network is unfolding for each time step
 Then for each time step in the network, the gradient descent is calculated with respect
to the weight of each of the parameter
 Once the weight for all time step is the same we can combine together the gradients
for all the time steps
 Then we update the weights for both the recurrent neurons as well as the dense
layers

Vanishing and Exploding Gradient Problem

Defining the problem

 During the training of all a deep network, the gradients are propagated back in time all
the way to the initial layer
 Gradients that come from deeper layers go through multiple matrix multiplications
according to the chain rule, and when they approach the earlier layers, if they have
small values ( <1 ) they shrink exponentially till they vanish
 Vanishing gradient make model learning difficult
 While if they have large values (>1), then they eventually blow up and crash the
model, this is the exploding gradient problem
Types of RNN Architectures
The common architectures which are used for sequence learning are:

 One to one
 One to many
 Many to one
 Many to many

One to one :This model is similar to a single layer neural network as it only provides linear
predictions .It is mostly used fixed-size input ‘x’ and fixed-size output ‘y’ image

classification)
One to many

 This consist of a single input ‘x’, activation ‘a’, and multiple outputs ‘y’
 Example: generating an audio stream. It takes a single audio stream as input and
generates new tones or new music based on that stream
 In some cases, it propagates the output ‘y’ to the next RNN units

Many to one

 This consist of multiple inputs ‘x’ (such as words or sentences), activation ‘a’ and
produce a single output ‘y’ at the end
 This type of architecture is mostly used to perform sentiment analysis as it processes
the entire input (collection of words sentences) to produce a single output (positive,
negative, or neutral sentiment)
Many to many

 In this, a single frame is taken as input for each RNN unit. A-frame represents
multiple inputs ‘x’, activations ‘a’ which are propagated through the network to
produce output ‘y’ which are the classification result for each frame
 It used mostly in video classification, where we try to classify each frame of the video

Bi- directional RNNs

 In this neural network, 2 hidden layers running in the opposite direction are
connected to produce a single output
 These layers allow the neural network to received information from both past as well
as a future state
 For example, given a word sequence: ‘I like programming’. The forward layer will
input the sequence as it is while the backward layer will feed the sequence in the
reverse order ‘programming like i’
 The output for this will be calculated by concatenating the word sequence at each time
step and generating the weight

Note:-

 RNNs remember each and every piece of information through timestamp

 The memory state which stores information of all the state is useful for tasks such as
sentence generation and time series prediction
 RNNs can handle inputs and outputs of arbitrary length
 RNNs share the same parameters across different time steps which means fewer
parameters to train and computation cost
 RNNs can not process very long sequences while making use of tanh or ReLu as an
activation function
 RNNs face vanishing and exploding gradient problem

Rcnn applications
Text summarization: Summarizing the text from any literature, for example, if a
news website wants to display brief summary of important news from each and every
news article on the website, then text summarization will be helpful
 Text recommendation: Text autofill or sentence generation in data every work by
making use of RNNs can help in automating the processes and make it less time
consuming
 Image recognition: RNNs can be combined with CNN in order to recognize an
image and give its description
 Music generation: RNNs can be used to generate new music or tunes, by feeding a
single tune as an input we can generate new notes or tunes of music.

Types of Rcnn networks :

Feedforward networks map one input to one output, and while we’ve visualized recurrent
neural networks in this way in the above diagrams, they do not actually have this constraint.
Instead, their inputs and outputs can vary in length, and different types of RNNs are used for
different use cases, such as music generation, sentiment classification, and machine
translation.

1. Vanishing Gradient Problem

Recurrent Neural Networks enable you to model time-dependent and sequential data
problems, such as stock market prediction, machine translation, and text generation. You will
find, however, RNN is hard to train because of the gradient problem.
RNNs suffer from the problem of vanishing gradients. The gradients carry information used
in the RNN, and when the gradient becomes too small, the parameter updates become
insignificant. This makes the learning of long data sequences difficult.

2. Exploding Gradient Problem

While training a neural network, if the slope tends to grow exponentially instead of decaying,
this is called an Exploding Gradient. This problem arises when large error gradients
accumulate, resulting in very large updates to the neural network model weights during the
training process.

Long training time, poor performance, and bad accuracy are the major issues in gradient
problems. Now, let’s discuss the most popular and efficient way to deal with gradient
problems, i.e., Long Short-Term Memory Network (LSTMs).

The word you predict will depend on the previous few words in context. Here, you need the
context of Spain to predict the last word in the text, and the most suitable answer to this
sentence is “Spanish.” The gap between the relevant information and the point where it's
eeded may have become very large. LSTMs help you solve this problem.

Common Activation Functions

Recurrent Neural Networks (RNNs) use activation functions just like other neural networks
to introduce non-linearity to their models. Here are some common activation functions used
in RNNs:
Sigmoid Function:

The sigmoid function is commonly used in RNNs. It has a range between 0 and 1, which
makes it useful for binary classification tasks. The formula for the sigmoid function is:

σ(x) = 1 / (1 + e^(-x))

Hyperbolic Tangent (Tanh) Function:

The tanh function is also commonly used in RNNs. It has a range between -1 and 1, which
makes it useful for non-linear classification tasks. The formula for the tanh function is:

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Rectified Linear Unit (Relu) Function:

The ReLU function is a non-linear activation function that is widely used in deep neural
networks. It has a range between 0 and infinity, which makes it useful for models that require
positive outputs. The formula for the ReLU function is:

ReLU(x) = max(0, x)

Leaky Relu Function:

The Leaky ReLU function is similar to the ReLU function, but it introduces a small slope to
negative values, which helps to prevent "dead neurons" in the model. The formula for the
Leaky ReLU function is:

Leaky ReLU(x) = max(0.01x, x)

Softmax Function:

The softmax function is often used in the output layer of RNNs for multi-class classification
tasks. It converts the network output into a probability distribution over the possible classes.
The formula for the softmax function is:

softmax(x) = e^x / ∑(e^x)

These are just a few examples of the activation functions used in RNNs. The choice of
activation function depends on the specific task and the model's architecture

Variant RNN Architectures

There are several variant RNN architectures that have been developed over the years to
address the limitations of the standard RNN architecture. Here are a few examples:

Long Short-Term Memory (LSTM) Networks

LSTM is a type of RNN that is designed to handle the vanishing gradient problem that can
occur in standard RNNs. It does this by introducing three gating mechanisms that control the
flow of information through the network: the input gate, the forget gate, and the output gate.
These gates allow the LSTM network to selectively remember or forget information from the
input sequence, which makes it more effective for long-term dependencies.

Gated Recurrent Unit (GRU) Networks

GRU is another type of RNN that is designed to address the vanishing gradient problem. It
has two gates: the reset gate and the update gate. The reset gate determines how much of the
previous state should be forgotten, while the update gate determines how much of the new
state should be remembered. This allows the GRU network to selectively update its internal
state based on the input sequence.
Bidirectional RNNs:

Bidirectional RNNs are designed to process input sequences in both forward and backward
directions. This allows the network to capture both past and future context, which can be
useful for speech recognition and natural language processing tasks.

Encoder-Decoder RNNs:

Encoder-decoder RNNs consist of two RNNs: an encoder network that processes the input
sequence and produces a fixed-length vector representation of the input and a decoder
network that generates the output sequence based on the encoder's representation. This
architecture is commonly used for sequence-to-sequence tasks such as machine translation.

Attention Mechanisms

Attention mechanisms are a technique that can be used to improve the performance of RNNs
on tasks that involve long input sequences. They work by allowing the network to attend to
different parts of the input sequence selectively rather than treating all parts of the input
sequence equally. This can help the network focus on the input sequence's most relevant parts
and ignore irrelevant information.

These are just a few examples of the many variant RNN architectures that have been
developed over the years. The choice of architecture depends on the specific task and the
characteristics of the input and output sequences.

Long Short-Term Memory Networks

LSTMs are a special kind of RNN — capable of learning long-term dependencies by

remembering information for long periods is the default behavior.

All RNN are in the form of a chain of repeating modules of a neural network. In standard
RNNs, this repeating module will have a very simple structure, such as a single tanh layer.
Fig: Long Short Term Memory Networks

LSTMs also have a chain-like structure, but the repeating module is a bit different structure.
Instead of having a single neural network layer, four interacting layers are communicating
extraordinarily.

Advantages and Shortcomings of RNNs

RNNs have various advantages, such as:

 Ability to handle sequence data

 Ability to handle inputs of varying lengths
 Ability to store or “memorize” historical information
The disadvantages are:
 The computation can be very slow.
 The network does not take into account future inputs to make decisions.
 Vanishing gradient problem, where the gradients used to compute the weight update
may get very close to zero, preventing the network from learning new weights. The
deeper the network, the more pronounced this problem is.
Different RNN Architectures

There are different variations of RNNs that are being applied practically in machine learning
problems:

Bidirectional Recurrent Neural Networks (BRNN)

In BRNN, inputs from future time steps are used to improve the accuracy of the network. It is
like knowing the first and last words of a sentence to predict the middle words.

Gated Recurrent Units (GRU)

These networks are designed to handle the vanishing gradient problem. They have a reset and
update gate. These gates determine which information is to be retained for future predictions.

Long Short Term Memory (LSTM)

LSTMs were also designed to address the vanishing gradient problem in RNNs. LSTMs use
three gates called input, output, and forget gate. Similar to GRU, these gates determine which
information to retain.

Key Differences Between CNN and RNN

 CNN is applicable for sparse data like images. RNN is applicable for time series and
sequential data.

 While training the model, CNN uses a simple backpropagation and RNN uses
backpropagation through time to calculate the loss.

 RNN can have no restriction in length of inputs and outputs, but CNN has finite
inputs and finite outputs.

 CNN has a feedforward network and RNN works on loops to handle sequential data.
 CNN can also be used for video and image processing. RNN is primarily used for
speech and text analysis.

Working of a Rcnn network:

A recurrent neural network (RNN) is the type of artificial neural network (ANN) that is
used in Apple’s Siri and Google’s voice search. RNN remembers past inputs due to an
internal memory which is useful for predicting stock prices, generating text, transcriptions,
and machine translation.

In the traditional neural network, the inputs and the outputs are independent of each other,
whereas the output in RNN is dependent on prior elementals within the sequence. Recurrent
networks also share parameters across each layer of the network. In feedforward networks,
there are different weights across each node. Whereas RNN shares the same weights within
each layer of the network and during gradient descent, the weights and basis are adjusted
individually to reduce the loss.

RNN

The image above is a simple representation of recurrent neural networks. If we are

forecasting stock prices using simple data [45,56,45,49,50,…], each input from X0 to Xt will
contain a past value. For example, X0 will have 45, X1 will have 56, and these values are
used to predict the next number in a sequence.

How Recurrent Neural Networks Work

In RNN, the information cycles through the loop, so the output is determined by the current
input and previously received inputs.

The input layer X processes the initial input and passes it to the middle layer A. The middle
layer consists of multiple hidden layers, each with its activation functions, weights, and
biases. These parameters are standardized across the hidden layer so that instead of creating
multiple hidden layers, it will create one and loop it over.

Instead of using traditional backpropagation, recurrent neural networks

use backpropagation through time (BPTT) algorithms to determine the gradient. In
backpropagation, the model adjusts the parameter by calculating errors from the output to the
input layer. BPTT sums the error at each time step as RNN shares parameters across each
layer. Learn more on RNNs and how they work at What are Recurrent Neural Networks?.

Recurrent Neural Networks

Humans don’t start their thinking from scratch every second. As you read this essay, you
understand each word based on your understanding of previous words. You don’t throw
everything away and start thinking from scratch again. Your thoughts have persistence.

Traditional neural networks can’t do this, and it seems like a major shortcoming. For example,
imagine you want to classify what kind of event is happening at every point in a movie. It’s
unclear how a traditional neural network could use its reasoning about previous events in the
film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing
information to persist.
Recurrent Neural Networks have loops.

In the above diagram, a chunk of neural network, A�, looks at some input xt�� and outputs
a value htℎ�. A loop allows information to be passed from one step of the network to the next.
These loops make recurrent neural networks seem kind of mysterious. However, if you think a
bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent
neural network can be thought of as multiple copies of the same network, each passing a
message to a successor. Consider what happens if we unroll the loop:

An unrolled recurrent neural network.

This chain-like nature reveals that recurrent neural networks are intimately related to sequences
and lists. They’re the natural architecture of neural network to use for such data.

And they certainly are used! In the last few years, there have been incredible success applying
RNNs to a variety of problems: speech recognition, language modeling, translation, image
captioning… The list goes on. I’ll leave discussion of the amazing feats one can achieve with
RNNs to Andrej Karpathy’s excellent blog post, The Unreasonable Effectiveness of Recurrent
Neural Networks. But they really are pretty amazing.
Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural
network which works, for many tasks, much much better than the standard version. Almost all
exciting results based on recurrent neural networks are achieved with them. It’s these LSTMs
that this essay will explore.

The Problem of Long-Term Dependencies

One of the appeals of RNNs is the idea that they might be able to connect previous information
to the present task, such as using previous video frames might inform the understanding of the
present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends.

Sometimes, we only need to look at recent information to perform the present task. For example,
consider a language model trying to predict the next word based on the previous ones. If we are
trying to predict the last word in “the clouds are in the sky,” we don’t need any further context –
it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the
relevant information and the place that it’s needed is small, RNNs can learn to use the past
information.

But there are also cases where we need more context. Consider trying to predict the last word in
the text “I grew up in France… I speak fluent French.” Recent information suggests that the
next word is probably the name of a language, but if we want to narrow down which language,
we need the context of France, from further back. It’s entirely possible for the gap between the
relevant information and the point where it is needed to become very large.

Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.
In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human
could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice,
RNNs don’t seem to be able to learn them.

LSTM Networks

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN,
capable of learning long-term dependencies. They were introduced by Hochreiter &
Schmidhuber (1997), and were refined and popularized by many people in following
work.1 They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering
information for long periods of time is practically their default behavior, not something they
struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network.
In standard RNNs, this repeating module will have a very simple structure, such as a single tanh
layer.

The repeating module in a standard RNN contains a single layer.

LSTMs also have this chain like structure, but the repeating module has a different structure.
Instead of having a single neural network layer, there are four, interacting in a very special way.

The repeating module in an LSTM contains four interacting layers.

Don’t worry about the details of what’s going on. We’ll walk through the LSTM diagram step
by step later. For now, let’s just try to get comfortable with the notation we’ll be using.

In the above diagram, each line carries an entire vector, from the output of one node to the inputs
of others. The pink circles represent pointwise operations, like vector addition, while the yellow
boxes are learned neural network layers. Lines merging denote concatenation, while a line
forking denote its content being copied and the copies going to different locations.

The Core Idea Behind LSTMs

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.

The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only
some minor linear interactions. It’s very easy for information to just flow along it unchanged.
The LSTM does have the ability to remove or add information to the cell state, carefully
regulated by structures called gates.

Gates are a way to optionally let information through. They are composed out of a sigmoid
neural net layer and a pointwise multiplication operation.

The sigmoid layer outputs numbers between zero and one, describing how much of each
component should be let through. A value of zero means “let nothing through,” while a value of
one means “let everything through!”

An LSTM has three of these gates, to protect and control the cell state.

Step-by-Step LSTM Walk Through

The first step in our LSTM is to decide what information we’re going to throw away from the
cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks
at ht−1ℎ�−1 and xt��, and outputs a number between 00 and 11 for each number in the cell
state Ct−1��−1. A 11 represents “completely keep this” while a 00 represents “completely
get rid of this.”
Let’s go back to our example of a language model trying to predict the next word based on all
the previous ones. In such a problem, the cell state might include the gender of the present
subject, so that the correct pronouns can be used. When we see a new subject, we want to forget
the gender of the old subject.
The next step is to decide what new information we’re going to store in the cell state. This has
two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update.
Next, a tanh layer creates a vector of new candidate values, C~t�~�, that could be added to
the state. In the next step, we’ll combine these two to create an update to the state.
In the example of our language model, we’d want to add the gender of the new subject to the
cell state, to replace the old one we’re forgetting.

It’s now time to update the old cell state, Ct−1��−1, into the new cell state Ct��. The
previous steps already decided what to do, we just need to actually do it.
We multiply the old state by ft��, forgetting the things we decided to forget earlier. Then we
add it∗C~t��∗�~�. This is the new candidate values, scaled by how much we decided to
update each state value.
In the case of the language model, this is where we’d actually drop the information about the old
subject’s gender and add the new information, as we decided in the previous steps.
Finally, we need to decide what we’re going to output. This output will be based on our cell
state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the
cell state we’re going to output. Then, we put the cell state through tanhtanh (to push the values
to be between −1−1 and 11) and multiply it by the output of the sigmoid gate, so that we only
output the parts we decided to.
For the language model example, since it just saw a subject, it might want to output information
relevant to a verb, in case that’s what is coming next. For example, it might output whether the
subject is singular or plural, so that we know what form a verb should be conjugated into if
that’s what follows next.

Variants on Long Short Term Memory

What I’ve described so far is a pretty normal LSTM. But not all LSTMs are the same as the
above. In fact, it seems like almost every paper involving LSTMs uses a slightly different
version. The differences are minor, but it’s worth mentioning some of them.

One popular LSTM variant, introduced by Gers & Schmidhuber (2000), is adding “peephole
connections.” This means that we let the gate layers look at the cell state.
Bidirectional recurrent neural networks

Bidirectional recurrent neural networks are a combination of two recurrent neural networks
that train in unison. One network trains from the start to the end of a sequence while the other
works in the opposite direction.

The bidirectional method that this type of recurrent neural network uses allows the model to
learn from both present and past information. Once the network has learned from this, it can
analyze future events accordingly. This feature sets it apart from other types of recurrent
neural networks. The dual nature of bidirectional recurrent neural networks is useful in
circumstances where context is required.

Long short-term memory

Long short-term memory recurrent neural networks handle long time-series data. This means
that they can recall long-term time-series data collected prior.

This model has three different gates: the input gate, the output gate, and the forget gate.
These gates act as a form of control over features of the network, such as saving or removing
memory.

The input gate decides which new information moves into the cell state. The output gate, on
the other hand, regulates which information is selected from the cell state. After that decision
is made, it chooses the next hidden state for the network. Finally, the forget gate removes any
information from the cell state that is deemed irrelevant or insignificant.
While in the cell state, the network has automatic control while discarding irrelevant
information or retaining relevant features. The vanishing gradient problem found in some
networks can be prevented via the use of long short-term memory networks.

Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
29 pages
Transformers and Attention Mechanisms - Pre Quiz - Attempt Review
No ratings yet
Transformers and Attention Mechanisms - Pre Quiz - Attempt Review
5 pages
21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
Ad3501 DL Unit 3 Notes
No ratings yet
Ad3501 DL Unit 3 Notes
30 pages
GenAI Module2
No ratings yet
GenAI Module2
190 pages
205123058_Lab_9_RNN
No ratings yet
205123058_Lab_9_RNN
8 pages
RNN Notes
No ratings yet
RNN Notes
45 pages
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
The Nature of Jesus Christ
No ratings yet
The Nature of Jesus Christ
2 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
Module 4 RNN LSTM GRU
No ratings yet
Module 4 RNN LSTM GRU
59 pages
Tutorial Letter 302/4/2024: Presenting Assignment Answers and Referencing
No ratings yet
Tutorial Letter 302/4/2024: Presenting Assignment Answers and Referencing
46 pages
NX NF TipsUndTricks
100% (1)
NX NF TipsUndTricks
12 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
Unit 4
No ratings yet
Unit 4
34 pages
Unit V
No ratings yet
Unit V
32 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
Module 4 Recurrent Neural Network
100% (1)
Module 4 Recurrent Neural Network
78 pages
DL 4 Notes
No ratings yet
DL 4 Notes
34 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
Module 5
No ratings yet
Module 5
21 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Module2 L7 RNN LSTM
No ratings yet
Module2 L7 RNN LSTM
47 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
NLP Lecture 6
No ratings yet
NLP Lecture 6
57 pages
RNNs: Design, Advantages, and Challenges
No ratings yet
RNNs: Design, Advantages, and Challenges
30 pages
Unit 5
No ratings yet
Unit 5
76 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
Bianchi
No ratings yet
Bianchi
62 pages
Semster - DL
No ratings yet
Semster - DL
15 pages
Module 06
No ratings yet
Module 06
5 pages
RNN Introduction
No ratings yet
RNN Introduction
22 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
What Are Recurrent Neural Networks
No ratings yet
What Are Recurrent Neural Networks
7 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
2 U4-Rnn
No ratings yet
2 U4-Rnn
17 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
18 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
RNN
No ratings yet
RNN
23 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Ejercicios Ingles 1 Eso
No ratings yet
Ejercicios Ingles 1 Eso
7 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Mobile App Portfolio
No ratings yet
Mobile App Portfolio
56 pages
RNN SK
No ratings yet
RNN SK
17 pages
DL Unit - III Notes1
No ratings yet
DL Unit - III Notes1
14 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Chapter 23
100% (1)
Chapter 23
48 pages
C# Chapter 6
No ratings yet
C# Chapter 6
19 pages
Session Guide
No ratings yet
Session Guide
6 pages
LRP English New
No ratings yet
LRP English New
60 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
Chapter 4 More Complex PHP Data Handlers Arrays, Hashes and Functions
No ratings yet
Chapter 4 More Complex PHP Data Handlers Arrays, Hashes and Functions
1 page
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Weekly Exam 3
No ratings yet
Weekly Exam 3
5 pages
PM800 User Guide
No ratings yet
PM800 User Guide
122 pages
Zapotec Civilization
No ratings yet
Zapotec Civilization
8 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
Descriptive Dataset
No ratings yet
Descriptive Dataset
6 pages
Els 135 Types of Test
No ratings yet
Els 135 Types of Test
7 pages
All Analysiscode Explanation
No ratings yet
All Analysiscode Explanation
22 pages
Introduction To Critical Reasoning - Week-01
No ratings yet
Introduction To Critical Reasoning - Week-01
27 pages
DLL MATH-2 Week8 Q2 Final
No ratings yet
DLL MATH-2 Week8 Q2 Final
8 pages
BSCS CC-112 PF 2022 Solved
No ratings yet
BSCS CC-112 PF 2022 Solved
8 pages
1.2language Processing Activities
No ratings yet
1.2language Processing Activities
15 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Manning 2010
No ratings yet
Manning 2010
13 pages
Hemanth (4,0)
No ratings yet
Hemanth (4,0)
4 pages
Drop Box
No ratings yet
Drop Box
59 pages
L3 - Substitution Cipher
No ratings yet
L3 - Substitution Cipher
22 pages
Studentsco: English First Language
No ratings yet
Studentsco: English First Language
7 pages
Software Versions List
No ratings yet
Software Versions List
9 pages
Aadhaar - Sachin
No ratings yet
Aadhaar - Sachin
1 page
Hafiz Final NDP PDF
No ratings yet
Hafiz Final NDP PDF
2 pages
AT+CGEQREQ - 3G Quality of Service Profile
No ratings yet
AT+CGEQREQ - 3G Quality of Service Profile
1 page

Unit 3 RCNN

Uploaded by

Unit 3 RCNN

Uploaded by

Recurrent Neural networks

What is a Recurrent Neural Network (RNN)?

Parameter U transformation the input xt to the state st

Parameter W transforms the previous state st-1 to the current state st

And, parameter V maps the computed internal state st to the output Ot

Formula to calculate current state:

The equation applying after activation function (tanh) is:

Vanishing and Exploding Gradient Problem

Defining the problem

Bi- directional RNNs

 RNNs remember each and every piece of information through timestamp

Types of Rcnn networks :

1. Vanishing Gradient Problem

2. Exploding Gradient Problem

Common Activation Functions

Hyperbolic Tangent (Tanh) Function:

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Rectified Linear Unit (Relu) Function:

Leaky Relu Function:

Leaky ReLU(x) = max(0.01x, x)

softmax(x) = e^x / ∑(e^x)

Variant RNN Architectures

Long Short-Term Memory (LSTM) Networks

Gated Recurrent Unit (GRU) Networks

Long Short-Term Memory Networks

LSTMs are a special kind of RNN — capable of learning long-term dependencies by

Advantages and Shortcomings of RNNs

RNNs have various advantages, such as:

 Ability to handle sequence data

Bidirectional Recurrent Neural Networks (BRNN)

Gated Recurrent Units (GRU)

Long Short Term Memory (LSTM)

Key Differences Between CNN and RNN

Working of a Rcnn network:

The image above is a simple representation of recurrent neural networks. If we are

How Recurrent Neural Networks Work

Instead of using traditional backpropagation, recurrent neural networks

Recurrent Neural Networks

An unrolled recurrent neural network.

The Problem of Long-Term Dependencies

The repeating module in a standard RNN contains a single layer.

The repeating module in an LSTM contains four interacting layers.

The Core Idea Behind LSTMs

Step-by-Step LSTM Walk Through

Variants on Long Short Term Memory

Long short-term memory

You might also like