0% found this document useful (0 votes)

101 views44 pages

Presentation On Variational Autoencoders

This document discusses variational autoencoders, including what they are, how they are trained, and how they can generate new data. Variational autoencoders extend standard autoencoders by modeling the distribution of the hidden representation rather than a fixed representation. This allows sampling of the hidden space to generate new data rather than just noise.

Uploaded by

Ayush Raina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views44 pages

Presentation On Variational Autoencoders

Uploaded by

Ayush Raina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Autoencoders Variational Autoencoders Animal Faces Maths

Variational Autoencoders
A Deep Generative Model

Ayush Raina, Anushka Dassi, Arnav Bhatt

Indian Institute of Science

April 12, 2024

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 1 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

What are Autoencoders ?

1. Autoencoders are neural networks that aim to learn the compact

representation of the input data whose dimension is much smaller
than the input data.
2. It consists of two parts: an encoder (ϕ) and a decoder (θ), where
encoder and decoder are neural networks.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 2 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

What are Autoencoders ?

1. Autoencoders are neural networks that aim to learn the compact

representation of the input data whose dimension is much smaller
than the input data.
2. It consists of two parts: an encoder (ϕ) and a decoder (θ), where
encoder and decoder are neural networks.
Goal

The goal of encoder network is to learn a hidden representation of

the input data, and the goal of decoder network is to reconstruct
the input data from the hidden representation.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 2 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training Autoencoders

• The training of autoencoders is done by minimizing the

reconstruction error between the input data and the
reconstructed data.
• The loss function used for training autoencoders is Mean
Squared Error (MSE).

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 3 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training Autoencoders

• The training of autoencoders is done by minimizing the

reconstruction error between the input data and the
reconstructed data.
• The loss function used for training autoencoders is Mean
Squared Error (MSE).
• The loss function is given by:

1X
N
L(ϕ, θ) = ||xi − Decoderθ (Encoderϕ (xi ))||2 (1)
N
i=1

where xi is the input data, Encoderϕ (xi ) is the hidden

representation of xi and Decoderθ (Encoderϕ (xi )) is the
reconstructed data.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 3 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Reconstructions of Autoencoders

Figure 1: Reconstruction of Autoencoders without CNN

Figure 2: Reconstruction of Autoencoders with CNN

In both the cases we trained for over 20 epochs.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 4 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Reconstructions of Autoencoders

Figure 3: Reconstruction of Autoencoders on Animal Face Dataset

This reconstruction was done on grayscale version, we also obtained

similar output on colored version.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 5 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Can we generate new data with Autoencoders ?

After the training of autoencoders, we can generate new data by

feeding random hidden representation to the decoder network. But
we may simply get noise in the output.
This is because the hidden representation lies in very small
subspace of the input space.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 6 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 4: Generation of new handwritten digits using Autoencoders

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 7 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 4: Generation of new handwritten digits using Autoencoders

This is the generation when we fed mean of the hidden

representations generated by the encoder network. Otherwise
output was simply a noise.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 7 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

What should be done ?

If we somehow feed the highly likely hidden representation z , then

we can expect meaningful output.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 8 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

What is highly likely hidden representation ?

In fact we want to sample from P(z (i) |x (i) ) where z (i) is the hidden
representation of x (i) .
Here both encoder and decoder networks are deterministic, which
means for every input data x (i) , the hidden representation z (i) is
xed and vice versa.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 9 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Introduction to VAE

Variational Autoencoders (VAE) have same structure as

autoencoders, but here we learn the distribution of the hidden
representation, rather than learning the xed hidden representation.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 10 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Introduction to VAE

Variational Autoencoders (VAE) have same structure as

autoencoders, but here we learn the distribution of the hidden
representation, rather than learning the xed hidden representation.
2 things to care about

1. We are interested in learning the distribution P(z (i) |x (i) ) so that

we can sample highly likely z (i) for a given x (i) .
2. We are also interested in learning the distribution P(x (i) |z (i) ) so
that we can generate new data by sampling z (i) from P(z (i) |x (i) ).
With above choice, neither encoder nor decoder is deterministic.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 10 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Modelling Assumptions

Assumption 1

z (i) ∼ N(0, Ik×k )

Assumption 2

Posterior distribution P(z (i) |x (i) ) ∼ N(µ(x (i) ), Σ(x (i) )) , where
µ(x (i) ) and Σ(x (i) ) are the functions of x (i) .

Assumption 3

P(x (i) |z (i) ) ∼ N(µ(z (i) ), Σ(z (i) )) , where µ(z (i) ) and Σ(z (i) ) are
the functions of z (i) .

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 11 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Achieving the goals

1. Since we assumed that P(z (i) |x (i) ) ∼ N(µ(x (i) ), Σ(x (i) ), the
goal of encoder network is to learn mean and variance of the
distribution.

2. We also assumed that P(x (i) |z (i) ) ∼ N(µ(z (i) ), Σ(z (i) ), the goal
of decoder network is to learn mean and variance of the distribution.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 12 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Visualizing the VAE

Figure 5: Variational Autoencoder

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 13 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Function in VAE

We we optimize the following MSE Loss + KL Divergence Loss

where KL Divergence Loss is given by:

1X k
(i) (i)
KL(P(z |x )||P(z )) = (µ2 + σj2 − log(σj ) − 1)
(i)
(2)
2 j=1 j

where µj and σj are the mean and variance of the latent space
distribution P(z (i) |x (i) ).In practice we assume that covariance
matrix is diagonal.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 14 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Function in VAE

We we optimize the following MSE Loss + KL Divergence Loss

where KL Divergence Loss is given by:

1X k
(i) (i)
KL(P(z |x )||P(z )) = (µ2 + σj2 − log(σj ) − 1)
(i)
(2)
2 j=1 j

where µj and σj are the mean and variance of the latent space
distribution P(z (i) |x (i) ).In practice we assume that covariance
matrix is diagonal. This can be derived from the expression of KL
Divergence between two Gaussian distributions.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 14 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

VAE's on MNIST dataset

Figure 6: Generation of new handwritten digits using VAE

We can clearly see that VAE has generated better handwritten

digits than autoencoders.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 15 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

VAE's on MNIST dataset

Figure 7: Reconstructions of VAE on MNIST dataset

Reconstructions are also similar to the original data.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 16 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key observations

1. Autoencoders are deterministic, while VAE's are probabilistic.

2. VAE's are better in generating new data than autoencoders.
3. Autoencoders are better in reconstruction than VAE's.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 17 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

About Dataset

Dataset Animal Faces

Size 512x512
Training Images 14630
Validation Images 1500
Number of Classes 3
For all out experiments we have resized our images to 128x128.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 18 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training for cats only

We trained VAE for 5153 cat images and used 500 images for
validation. We trained over 40 epochs. Here are the results:

Figure 8: Reconstructions of VAE on Cat dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 19 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 9: Generation of new cat images using VAE

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 20 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curves

Figure 10: Loss Curves for VAE on Cat dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 21 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training for wild animals only

We trained VAE for 4738 images of wild animals and used 500
images for validation. We trained over 70 epochs.

Figure 11: Reconstructions of VAE on Wild Animals

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 22 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 12: Generation of new wild animal images using VAE

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 23 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curves

Figure 13: Loss Curves for VAE on Wild Animals

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 24 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training on full dataset

We trained VAE for 14630 images of animals and used 1500 images
for validation. We trained over 100 epochs.

Figure 14: Reconstruction of VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 25 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 15: Generation of new animal face images using VAE

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 26 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curve 1

Here are plot for Train Loss and Validation Loss for VAE on Animal
Face Dataset.

Figure 16: Loss Curves for VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 27 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curve 2

Here are plot for KL Divergence Loss during training and validation
for VAE on Animal Face Dataset.

Figure 17: KL Divergence Loss for VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 28 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Frechet Inception Distance

We took randomly choosen 2000 images resized to 128x128 from

training set and generated 2000 images using VAE. Our calculated
FID score is 269.98.

Figure 18: Frechet Inception Distance for VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 29 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key Takeaways