Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
38 views40 pages

Ch3 Auto Encoder

Autoencoders are deep learning algorithms used for unsupervised learning, particularly for data compression and dimensionality reduction. They consist of an encoder that compresses input data into a lower-dimensional representation and a decoder that reconstructs the original input from this compressed form. Various types of autoencoders exist, including linear, denoising, sparse, and variational autoencoders, each with distinct advantages and applications in tasks like image compression and feature extraction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views40 pages

Ch3 Auto Encoder

Autoencoders are deep learning algorithms used for unsupervised learning, particularly for data compression and dimensionality reduction. They consist of an encoder that compresses input data into a lower-dimensional representation and a decoder that reconstructs the original input from this compressed form. Various types of autoencoders exist, including linear, denoising, sparse, and variational autoencoders, each with distinct advantages and applications in tasks like image compression and feature extraction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Auto-encoder

Unsupervised learning

By. Dr. Shraddha Mithbavkar


Autoencoders
• Autoencoders are a type of deep learning algorithm that are
designed to receive an input and transform it into a different
representation. They play an important part in image
construction.
• Autoencoders are very useful in the field of unsupervised
machine learning. You can use them to compress the data and
reduce its dimensionality.
• The main difference between Autoencoders and Principle
Component Analysis (PCA) is that while PCA finds the
directions along which you can project the data with
maximum variance, Autoencoders reconstruct our original
input given just a compressed version of it.
Data Compression
• Data Compression: Multidimensional data can be
represented best and exhibit higher precision but it may
result slower performance. Higher dimension result in
larger training time, whereas low or reduced
dimensionality gives reduced precision but high
performance.
• Data compression is the process of encoding,
reconstructing or modifying the input data and
converting it into a smaller representation with reduced
size to facilitate storage or transmission. It is achieved
by dimensionality reduction.
Data Compression
• Dimensionality Reductions:
• Feature selection: Select subset of input features or
attributes. Process of most relevant feature and
discarding noise in the data.
• Feature extraction: The process of transform the data
from high dimensionality space into a space of lower
dimensionality.
• In feature extraction the number of feature in a dataset
are reduced by creating new feature from that of the
existing. New feature set summarize most of the
information from original feature set.
Auto-encoder
• Auto-encoder- It learn the compressed distributed
representation for the data for the purpose of
dimensionality reduction.
• It learn abstract features in an unsupervised way so
you can apply them to a supervised task.
• It consist of two components: an encoder function
h=f(y) an a decoder that produces a reconstruction
r=g(h)

f g
Y h r
Architecture

• Architecture
Architecture

• The encoder maps the input to the latent


space and decoder reconstruct the input.
• The latent space has lower dimensionality
than the input and they are capable of
reconstructing it.
• The encoder encodes the input images as the
compressed representation in a reduced
dimension.
Architecture

• Next component is bottleneck. The latent


space is the hidden space in which the data
lies in the bottle neck. Compressed version of
data is called code.
• Third component is decoder. It decodes the
encoded image back to the original dimension
by reconstructing the input from the latent
space representation. Decoded image is lossy
reconstruction of the original image.
• Information is lost because it goes from
smaller to larger dimensionality. The loss
function equation helps to find out the value
of how much information is lost. This measure
tells us how effectively the decoder has
learned to reconstructed the input image Y
given its latent representation Z. Out put of
decoder must of original size of input.
Hyper parameter in autoencoder

Code No of
Size layers

No of nodes Loss Function


per layers
Linear Autoencoder
• Linear auto-encoder: It has only a single layer
encoder and single layer decoder. It has one
hidden layer, linear activation function, and
squared error loss.
• L(Y, Y’)=||Y-Y’||^2
Auto encoder Vs PCA
Auto encoder PCA
Auto encoder are unsupervised machine PCA is unsupervised
learning algorithm machine learning algorithm

Auto encoder can learn non linear PCA is linear technique for
transformation dimensionality reduction
Auto encoder use multiple layer eg. One huge transformation is
Convolutional layer are used for learning happen with PCA
which are better for image, video, and
series of data.
Autoencoder are more flexible than PCA PCA is linear technique
as it can represent both linear and non
linear transformations.
Representational power, layer size and
depth
• Auto-encoder is feed forward network, these advantages also
apply to autoencoder.
• Advantages of depth
1. Autoencoders are often trained with a single layer encoder
and a single layer decoder, but using many-layered (deep)
encoders and decoders offers many advantages.
2. Depth can exponentially reduce the computational cost of
representing some functions.
3. Depth can exponentially decrease the amount of training data
needed to learn some functions.
4. Experimentally, deep autoencoders yield better compression
compared to shallow or linear autoencoders.
• The mapping from input to code is shallow.
This means that we are not able to enforce
arbitrary constraints, such as that the code
should be sparse.
• a Deep autoencoder with atleast one
additional layer inside the encoder itself, can
approximate any mapping from input to code
arbitary well, given enough hidden units.
Types of autoencoder

• Types of autoencoder
1. Undercomplete autoencoder
2. Denoising autoencoder
3. Sparse autoencoder
4. Deep autoencoder
5. Contractive autoencoder
6. Convolutional autoencoder
7. Variational autoencoder
8. Stack autoencoder
Undercomplete autoencoder

• Undercomplete autoencoder
• The objective of undercomplete autoencoder is to capture the
most important features present in the data. Undercomplete
autoencoders have a smaller dimension for hidden layer
compared to the input layer. This helps to obtain important
features from the data. It minimizes the loss function by
penalizing the g(f(x)) for being different from the input x.
• L(y,g(f(y)))
• L is loss function penalizing g(f(y)) for being similar from y, such
as MSE
• F is nonlinear encoder function
• G is nonlinear decoder function
Undercomplete autoencoder

Advantages-
• Undercomplete autoencoders do not need any
regularization as they maximize the probability
of data rather than copying the input to the
output.
Drawbacks-
• Using an overparameterized model due to lack
of sufficient training data can create overfitting.
Denoising autoencoder

• Denoising autoencoder
• Denoising autoencoders create a corrupted copy of the
input by introducing some noise. This helps to avoid the
autoencoders to copy the input to the output without
learning features about the data. These autoencoders
take a partially corrupted input while training to
recover the original undistorted input. The model
learns a vector field for mapping the input data towards
a lower dimensional manifold which describes the
natural data to cancel out the added noise.
Denoising autoencoder

• Advantages-
• It was introduced to achieve good representation. Such a
representation is one that can be obtained robustly from a
corrupted input and that will be useful for recovering the
corresponding clean input.
• Corruption of the input can be done randomly by making
some of the input as zero. Remaining nodes copy the input
to the noised input.
• Minimizes the loss function between the output node and
the corrupted input.
• Setting up a single-thread denoising autoencoder is easy.
Denoising autoencoder

• Drawbacks-
• To train an autoencoder to denoise data, it is
necessary to perform preliminary stochastic
mapping in order to corrupt the data and use
as input.
• This model isn't able to develop a mapping
which memorizes the training data because
our input and target output are no longer the
same.
Denoising autoencoder

• Denoising autoencoder
Sparse autoencoder

• Sparse autoencoder
• Hidden layers may have large number of nodes than input layer.
• They can still discover important features from the data. A generic sparse
autoencoder is visualized where the obscurity of a node corresponds with
the level of activation. Sparsity constraint is introduced on the hidden layer.
This is to prevent output layer copy input data.
• Sparsity may be obtained by additional terms in the loss function during the
training process, either by comparing the probability distribution of the
hidden unit activations with some low desired value,or by manually zeroing
all but the strongest hidden unit activations.
• L(y,g(f(y))+Ω(h) here Ω(h) is penalty function and h=f(y) is encoder output.
• Penalty function simply as a regularizer term added to a feedforward
network whose primary task is to copy the input to the output
(unsupervised) also perform some supervise task that depend on these
sparse features.
Sparse autoencoder
Advantages-

● Sparse autoencoders have a sparsity penalty, a value close to zero


but not exactly zero. Sparsity penalty is applied on the hidden layer in
addition to the reconstruction error. This prevents overfitting.
● They take the highest activation values in the hidden layer and zero
out the rest of the hidden nodes. This prevents autoencoders to use all
of the hidden nodes at a time and forcing only a reduced number of
hidden nodes to be used.
Sparse autoencoder
Drawbacks-

● For it to be working, it's essential that the individual nodes of a


trained model which activate are data dependent, and that
different inputs will result in activations of different nodes
through the network.
Deep autoencoder

Deep autoencoder
• Deep Autoencoders consist of two identical deep belief networks,
One network for encoding and another for decoding.
• Typically deep autoencoders have 4 to 5 layers for encoding and
the next 4 to 5 layers for decoding.
• We use unsupervised layer by layer pre-training for this model.
The layers are Restricted Boltzmann Machines which are the
building blocks of deep-belief networks.
• Processing the benchmark dataset MNIST, a deep autoencoder
would use binary transformations after each RBM. Deep
autoencoders are useful in topic modeling, or statistically modeling
abstract topics that are distributed across a collection of
documents. They are also capable of compressing images into 30
number vectors.
Deep autoencoder

Advantages-

● Deep autoencoders can be used for other types of datasets with


real-valued data, on which you would use Gaussian rectified
transformations for the RBMs instead.
● Final encoding layer is compact and fast.
● Drawbacks-
● Chances of overfitting to occur since there's more parameters
than input data.
● Training the data maybe a nuance since at the stage of the
decoder’s backpropagation, the learning rate should be lowered
or made slower depending on whether binary or continuous data
is being handled.
Contractive autoencoder

Contractive autoencoder (CAE)


• The objective of a contractive autoencoder is to have a robust
learned representation which is less sensitive to small variation in
the data.
• Robustness of the representation for the data is done by applying a
penalty term to the loss function.
L(y,g(f(y))+Ω(h,y)
• Contractive autoencoder is another regularization technique just like
sparse and denoising autoencoders. However, this regularizer
corresponds to the Frobenius norm of the Jacobian matrix of the
encoder activations with respect to the input. Frobenius norm of the
Jacobian matrix for the hidden layer is calculated with respect to
input and it is basically the sum of square of all elements.
Contractive autoencoder

Advantages-

● Contractive autoencoder is a better choice than denoising


autoencoder to learn useful feature extraction.
● This model learns an encoding in which similar inputs have similar
encodings. Hence, we're forcing the model to learn how to contract
a neighborhood of inputs into a smaller neighborhood of outputs.
Contractive autoencoder
Convolutional Autoencoder

Advantages-

● Due to their convolutional nature, they scale well to realistic-sized high


dimensional images.
● Can remove noise from picture or reconstruct missing parts.

Drawbacks-

● The reconstruction of the input image is often blurry and of lower


quality due to compression during which information is lost.
Convolutional Autoencoder
Variational autoencoder

Variational autoencoder models make strong assumptions concerning


the distribution of latent variables. They use a variational approach for
latent representation learning, which results in an additional loss
component and a specific estimator for the training algorithm called the
Stochastic Gradient Variational Bayes estimator. It assumes that the data
is generated by a directed graphical model and that the encoder is
learning an approximation to the posterior distribution where Ф and θ
denote the parameters of the encoder (recognition model) and decoder
(generative model) respectively. The probability distribution of the latent
vector of a variational autoencoder typically matches that of the training
data much closer than a standard autoencoder.
Variational autoencoder

Advantages-

● It gives significant control over how we want to model our latent


distribution unlike the other models.
● After training you can just sample from the distribution followed by
decoding and generating new data.

Drawbacks-

● When training the model, there is a need to calculate the relationship


of each parameter in the network with respect to the final output loss
using a technique known as backpropagation. Hence, the sampling
process requires some extra attention.
Variational autoencoder
Stack autoencoder
Some datasets have a complex relationship within the
features. Thus, using only one Autoencoder is not
sufficient. A single Autoencoder might be unable to
reduce the dimensionality of the input features.
Therefore for such use cases, we use stacked
autoencoders. The stacked autoencoders are, as the
name suggests, multiple encoders stacked on top of
one another. A stacked autoencoder with three
encoders stacked on top of each other is shown in the
following figure.
Stack autoencoder
Stack autoencoder
According to the architecture shown in the figure above,
the input data is first given to autoencoder 1. The
output of the autoencoder 1 and the input of the
autoencoder 1 is then given as an input to autoencoder
2. Similarly, the output of autoencoder 2 and the input
of autoencoder 2 are given as input to autoencoder 3.
Thus, the length of the input vector for autoencoder 3 is
double than the input to the input of autoencoder 2.
This technique also helps to solve the problem of
insufficient data to some extent.
Application of Autoencoder
• For dimensionality reduction and information retrieval task. Lower
dimensionality representations can improve performance on many
tasks, such as classification. model of smaller space consume less
memory and runtime.
• after dimensionality reduction, we can store all database entries in a
hash table mapping binary code vector to entries. This hash table
allows us to perform information retrieval by returning all database
entries that have same binary code as the query.
• For data denoising and dimensionality reduction for data visualization.
• with appropriate dimensionality reduction method and sparsity
constraints auto encoder can learn data projection that are better than
PCA or other basic technique.
• For watermark removal, feature variation, image colouring, building
recommendation system, encoding features in massive dataset.
Application of Autoencoder: image compression

• Conversion of 2 D image in 1 D form. The number of


elements in the 1 D vector varies based on the task
being solved. Fewer elements in a vector, the more
complexity in reproducing the original image.
• An encoder generate the 1 D vector from an input
image. The layer included can be dense, convolution,
dropout, tec. the job of decoder is to reconstruct the
original image with the highest possible quality. The
decoder is just a reflection of the encoder. The loss is
calculated by comparing the original and reconstructed
image.
Thank you

You might also like