Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views3 pages

LSTM

Long Short-Term Memory (LSTM) is an advanced type of Recurrent Neural Network (RNN) that effectively captures long-term dependencies in sequential data, making it suitable for applications like language translation and time series forecasting. LSTMs utilize a memory cell and three gates (input, forget, and output) to manage information flow, addressing issues like vanishing and exploding gradients that hinder traditional RNNs. Variants such as Bidirectional LSTMs enhance performance by processing data in both forward and backward directions, and LSTMs are widely used in various fields including speech recognition, anomaly detection, and video analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

LSTM

Long Short-Term Memory (LSTM) is an advanced type of Recurrent Neural Network (RNN) that effectively captures long-term dependencies in sequential data, making it suitable for applications like language translation and time series forecasting. LSTMs utilize a memory cell and three gates (input, forget, and output) to manage information flow, addressing issues like vanishing and exploding gradients that hinder traditional RNNs. Variants such as Bidirectional LSTMs enhance performance by processing data in both forward and backward directions, and LSTMs are widely used in various fields including speech recognition, anomaly detection, and video analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

What is LSTM – Long Short Term Memory?

Last Updated : 05 Apr, 2025

Long Short-Term Memory (LSTM) is an enhanced version of the Recurrent Neural Network
(RNN) designed by Hochreiter & Schmidhuber. LSTMs can capture long-term dependencies in sequential
data making them ideal for tasks like language translation, speech recognition and time series
forecasting.

Unlike traditional RNNs which use a single hidden state passed through time LSTMs introduce a memory
cell that holds information over extended periods addressing the challenge of learning long-term
dependencies.

Problem with Long-Term Dependencies in RNN


Recurrent Neural Networks (RNNs) are designed to handle sequential data by maintaining a hidden state
that captures information from previous time steps. However they often face challenges in learning long-
term dependencies where information from distant time steps becomes crucial for making accurate
predictions for current state. This problem is known as the vanishing gradient or exploding gradient
problem.

Vanishing Gradient: When training a model over time, the gradients (which help the model learn)
can shrink as they pass through many steps. This makes it hard for the model to learn long-term
patterns since earlier information becomes almost irrelevant.
Exploding Gradient: Sometimes, gradients can grow too large, causing instability. This makes it
difficult for the model to learn properly, as the updates to the model become erratic and unpredictable.

Both of these issues make it challenging for standard RNNs to effectively capture long-term
dependencies in sequential data.

LSTM Architecture
LSTM architectures involves the memory cell which is controlled by three gates: the input gate, the forget
gate and the output gate. These gates decide what information to add to, remove from and output from
the memory cell.

Input gate: Controls what information is added to the memory cell.


Forget gate: Determines what information is removed from the memory cell.
Output gate: Controls what information is output from the memory cell.

This allows LSTM networks to selectively retain or discard information as it flows through the network
which allows them to learn long-term dependencies. The network has a hidden state which is like its
short-term memory. This memory is updated using the current input, the previous hidden state and the
current state of the memory cell.

Working of LSTM
LSTM architecture has a chain structure that contains four neural networks and different memory blocks
called cells.
LSTM Model

Information is retained by the cells and the memory manipulations are done by the gates. There are
three gates –

Forget Gate

The information that is no longer useful in the cell state is removed with the forget gate. Two
inputs xt (input at the particular time) and ht-1 (previous cell output) are fed to the gate and multiplied with
weight matrices followed by the addition of bias. The resultant is passed through an activation function
which gives a binary output. If for a particular cell state the output is 0, the piece of information is
forgotten and for output 1, the information is retained for future use.

The equation for the forget gate is:


ft = σ(Wf ⋅ [ht−1 , xt ] + bf )
​ ​ ​ ​ ​

where:

W_f represents the weight matrix associated with the forget gate.
[h_t-1, x_t] denotes the concatenation of the current input and the previous hidden state.
b_f is the bias with the forget gate.
σ is the sigmoid activation function.

Forget Gate

Input gate

The addition of useful information to the cell state is done by the input gate. First, the information is
regulated using the sigmoid function and filter the values to be remembered similar to the forget gate
using inputs ht-1 and xt. . Then, a vector is created using tanh function that gives an output from -1 to +1,
which contains all the possible values from ht-1 and xt. At last, the values of the vector and the regulated
values are multiplied to obtain the useful information. The equation for the input gate is:
it = σ(Wi ⋅ [ht−1 , xt ] + bi )
​ ​ ​ ​ ​

C^t = tanh(Wc ⋅ [ht−1 , xt ] + bc )


​ ​ ​ ​ ​

We multiply the previous state by ft, disregarding the information we had previously chosen to ignore.
Next, we include it∗Ct. This represents the updated candidate values, adjusted for the amount that we
chose to update each state value.
Ct = ft ⊙ Ct−1 + it ⊙ C^t
​ ​ ​ ​ ​

where

⊙ denotes element-wise multiplication


tanh is tanh activation function

Input Gate
Output gate

The task of extracting useful information from the current cell state to be presented as output is done by
the output gate. First, a vector is generated by applying tanh function on the cell. Then, the information is
regulated using the sigmoid function and filter by the values to be remembered using inputs ht−1 and xt . At
​ ​

last, the values of the vector and the regulated values are multiplied to be sent as an output and input to
the next cell. The equation for the output gate is:
ot = σ(Wo ⋅ [ht−1 , xt ] + bo )
​ ​ ​ ​ ​

Output Gate

Bidirectional LSTM Model


Bidirectional LSTM (Bi LSTM/ BLSTM) is a variation of normal LSTM which processes sequential data in
both forward and backward directions. This allows Bi LSTM to learn longer-range dependencies in
sequential data than traditional LSTMs which can only process sequential data in one direction.

Bi LSTMs are made up of two LSTM networks one that processes the input sequence in the forward
direction and one that processes the input sequence in the backward direction.
The outputs of the two LSTM networks are then combined to produce the final output.

LSTM models including Bi LSTMs have demonstrated state-of-the-art performance across various
tasks such as machine translation, speech recognition and text summarization.

LSTM networks can be stacked to form deeper models allowing them to learn more complex patterns in
data. Each layer in the stack captures different levels of information and time-based relationships in the
input.

Applications of LSTM
Some of the famous applications of LSTM includes:

Language Modeling: Used in tasks like language modeling, machine translation and text
summarization. These networks learn the dependencies between words in a sentence to generate
coherent and grammatically correct sentences.
Speech Recognition: Used in transcribing speech to text and recognizing spoken commands. By
learning speech patterns they can match spoken words to corresponding text.
Time Series Forecasting: Used for predicting stock prices, weather and energy consumption.
They learn patterns in time series data to predict future events.
Anomaly Detection: Used for detecting fraud or network intrusions. These networks can identify
patterns in data that deviate drastically and flag them as potential anomalies.
Recommender Systems: In recommendation tasks like suggesting movies, music and books.
They learn user behavior patterns to provide personalized suggestions.
Video Analysis: Applied in tasks such as object detection, activity recognition and action
classification. When combined with Convolutional Neural Networks (CNNs) they help analyze video
data and extract useful information.

You might also like