Encoder-Decoder Sequence-to-Sequence
Architecture
Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Encoder-Decoder Sequence-to-Sequence
Architecture
The Encoder-Decoder Sequence-to-Sequence
(Seq2Seq) architecture is a machine learning
architecture designed for tasks involving
sequential data.
It takes an input sequence, processes it, and
generates an output sequence.
Encoder-Decoder Sequence-to-Sequence
Architecture
The architecture consists of two fundamental
components: an encoder and a decoder.
The encoder processes the input sequence and
transforms it into a fixed-size hidden
representation.
The decoder uses the hidden representation to
generate output sequence.
Encoder-Decoder Sequence-to-Sequence
Architecture
The encoder-decoder structure allows them to
handle input and output sequences of different
lengths, making them capable to handle
sequential data.
The model is trained to maximize the likelihood of
the correct output sequence given the input
sequence.
Encoder-Decoder Sequence-to-Sequence
Architecture
Commonly used in tasks involving NLP, speech
recognition, machine translation or question
answering.
Where the input and output sequences in the
training set are generally not of the same length
(although their lengths might be related).
Encoder-Decoder Sequence-to-Sequence
Architecture
Imagine we have an input sentence:
👉 "The sky is“
The correct output word (what the model
should predict) is:
👉 "blue"
Encoder-Decoder Sequence-to-Sequence
Architecture
Encoder Block
The main purpose of the encoder block is to process
the input sequence and capture information in a
fixed-size context vector.
Encoder
1. The input sequence is put into the encoder.
2. The encoder processes each element of the input
sequence using neural networks (or transformer
architecture).
Encoder
3. Throughout this process, the encoder keeps an
internal state, and the ultimate hidden state
functions as the context vector that
encapsulates a compressed representation of
the entire input sequence.
4. This context vector captures the semantic
meaning and important information of the input
sequence.
Decoder Block
The decoder block is similar to encoder block.
The decoder processes the context vector from
encoder to generate output sequence
incrementally.
Decoder Architecture
In the training phase, the decoder receives both
the context vector and the desired target output
sequence (ground truth).
During inference, the decoder relies on its own
previously generated outputs as inputs for
subsequent steps.
Encoder-Decoder Sequence-to-Sequence
Architecture
We often call the input to the RNN the “context.”
We want to produce a representation of this
context, C.
The context C might be a vector or sequence of
vectors that summarize the input sequence
X = (x(1), . . . , x(nx)).
Encoder-Decoder Sequence-to-Sequence
Architecture
The idea is very simple:
1. An encoder or reader or input RNN processes the
input sequence. The encoder emits the context C,
usually as a simple function of its final hidden state.
Encoder-Decoder Sequence-to-Sequence
Architecture
(2) a decoder or writer or output RNN is
conditioned on that fixed-length vector to
generate the output sequence (or computes
the probability of a given output sequence).
Y = (y(1) , . . . , y(ny )).
Thank You!