Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
101 views44 pages

Presentation On Variational Autoencoders

This document discusses variational autoencoders, including what they are, how they are trained, and how they can generate new data. Variational autoencoders extend standard autoencoders by modeling the distribution of the hidden representation rather than a fixed representation. This allows sampling of the hidden space to generate new data rather than just noise.

Uploaded by

Ayush Raina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views44 pages

Presentation On Variational Autoencoders

This document discusses variational autoencoders, including what they are, how they are trained, and how they can generate new data. Variational autoencoders extend standard autoencoders by modeling the distribution of the hidden representation rather than a fixed representation. This allows sampling of the hidden space to generate new data rather than just noise.

Uploaded by

Ayush Raina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Autoencoders Variational Autoencoders Animal Faces Maths

Variational Autoencoders
A Deep Generative Model

Ayush Raina, Anushka Dassi, Arnav Bhatt


Indian Institute of Science

April 12, 2024

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 1 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

What are Autoencoders ?

1. Autoencoders are neural networks that aim to learn the compact


representation of the input data whose dimension is much smaller
than the input data.
2. It consists of two parts: an encoder (ϕ) and a decoder (θ), where
encoder and decoder are neural networks.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 2 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

What are Autoencoders ?

1. Autoencoders are neural networks that aim to learn the compact


representation of the input data whose dimension is much smaller
than the input data.
2. It consists of two parts: an encoder (ϕ) and a decoder (θ), where
encoder and decoder are neural networks.
Goal

The goal of encoder network is to learn a hidden representation of


the input data, and the goal of decoder network is to reconstruct
the input data from the hidden representation.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 2 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training Autoencoders

• The training of autoencoders is done by minimizing the


reconstruction error between the input data and the
reconstructed data.
• The loss function used for training autoencoders is Mean
Squared Error (MSE).

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 3 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training Autoencoders

• The training of autoencoders is done by minimizing the


reconstruction error between the input data and the
reconstructed data.
• The loss function used for training autoencoders is Mean
Squared Error (MSE).
• The loss function is given by:

1X
N
L(ϕ, θ) = ||xi − Decoderθ (Encoderϕ (xi ))||2 (1)
N
i=1

where xi is the input data, Encoderϕ (xi ) is the hidden


representation of xi and Decoderθ (Encoderϕ (xi )) is the
reconstructed data.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 3 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Reconstructions of Autoencoders

Figure 1: Reconstruction of Autoencoders without CNN

Figure 2: Reconstruction of Autoencoders with CNN

In both the cases we trained for over 20 epochs.


Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 4 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Reconstructions of Autoencoders

Figure 3: Reconstruction of Autoencoders on Animal Face Dataset

This reconstruction was done on grayscale version, we also obtained


similar output on colored version.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 5 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Can we generate new data with Autoencoders ?

After the training of autoencoders, we can generate new data by


feeding random hidden representation to the decoder network. But
we may simply get noise in the output.
This is because the hidden representation lies in very small
subspace of the input space.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 6 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 4: Generation of new handwritten digits using Autoencoders

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 7 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 4: Generation of new handwritten digits using Autoencoders

This is the generation when we fed mean of the hidden


representations generated by the encoder network. Otherwise
output was simply a noise.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 7 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

What should be done ?

If we somehow feed the highly likely hidden representation z , then


we can expect meaningful output.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 8 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

What is highly likely hidden representation ?

In fact we want to sample from P(z (i) |x (i) ) where z (i) is the hidden
representation of x (i) .
Here both encoder and decoder networks are deterministic, which
means for every input data x (i) , the hidden representation z (i) is
xed and vice versa.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 9 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Introduction to VAE

Variational Autoencoders (VAE) have same structure as


autoencoders, but here we learn the distribution of the hidden
representation, rather than learning the xed hidden representation.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 10 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Introduction to VAE

Variational Autoencoders (VAE) have same structure as


autoencoders, but here we learn the distribution of the hidden
representation, rather than learning the xed hidden representation.
2 things to care about

1. We are interested in learning the distribution P(z (i) |x (i) ) so that


we can sample highly likely z (i) for a given x (i) .
2. We are also interested in learning the distribution P(x (i) |z (i) ) so
that we can generate new data by sampling z (i) from P(z (i) |x (i) ).
With above choice, neither encoder nor decoder is deterministic.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 10 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Modelling Assumptions

Assumption 1

z (i) ∼ N(0, Ik×k )

Assumption 2

Posterior distribution P(z (i) |x (i) ) ∼ N(µ(x (i) ), Σ(x (i) )) , where
µ(x (i) ) and Σ(x (i) ) are the functions of x (i) .

Assumption 3

P(x (i) |z (i) ) ∼ N(µ(z (i) ), Σ(z (i) )) , where µ(z (i) ) and Σ(z (i) ) are
the functions of z (i) .

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 11 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Achieving the goals

1. Since we assumed that P(z (i) |x (i) ) ∼ N(µ(x (i) ), Σ(x (i) ), the
goal of encoder network is to learn mean and variance of the
distribution.

2. We also assumed that P(x (i) |z (i) ) ∼ N(µ(z (i) ), Σ(z (i) ), the goal
of decoder network is to learn mean and variance of the distribution.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 12 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Visualizing the VAE

Figure 5: Variational Autoencoder

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 13 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Function in VAE

We we optimize the following MSE Loss + KL Divergence Loss


where KL Divergence Loss is given by:

1X k
(i) (i)
KL(P(z |x )||P(z )) = (µ2 + σj2 − log(σj ) − 1)
(i)
(2)
2 j=1 j

where µj and σj are the mean and variance of the latent space
distribution P(z (i) |x (i) ).In practice we assume that covariance
matrix is diagonal.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 14 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Function in VAE

We we optimize the following MSE Loss + KL Divergence Loss


where KL Divergence Loss is given by:

1X k
(i) (i)
KL(P(z |x )||P(z )) = (µ2 + σj2 − log(σj ) − 1)
(i)
(2)
2 j=1 j

where µj and σj are the mean and variance of the latent space
distribution P(z (i) |x (i) ).In practice we assume that covariance
matrix is diagonal. This can be derived from the expression of KL
Divergence between two Gaussian distributions.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 14 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

VAE's on MNIST dataset

Figure 6: Generation of new handwritten digits using VAE

We can clearly see that VAE has generated better handwritten


digits than autoencoders.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 15 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

VAE's on MNIST dataset

Figure 7: Reconstructions of VAE on MNIST dataset

Reconstructions are also similar to the original data.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 16 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key observations

1. Autoencoders are deterministic, while VAE's are probabilistic.


2. VAE's are better in generating new data than autoencoders.
3. Autoencoders are better in reconstruction than VAE's.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 17 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

About Dataset

Dataset Animal Faces


Size 512x512
Training Images 14630
Validation Images 1500
Number of Classes 3
For all out experiments we have resized our images to 128x128.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 18 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training for cats only

We trained VAE for 5153 cat images and used 500 images for
validation. We trained over 40 epochs. Here are the results:

Figure 8: Reconstructions of VAE on Cat dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 19 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 9: Generation of new cat images using VAE

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 20 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curves

Figure 10: Loss Curves for VAE on Cat dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 21 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training for wild animals only

We trained VAE for 4738 images of wild animals and used 500
images for validation. We trained over 70 epochs.

Figure 11: Reconstructions of VAE on Wild Animals


Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 22 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 12: Generation of new wild animal images using VAE

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 23 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curves

Figure 13: Loss Curves for VAE on Wild Animals

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 24 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Training on full dataset

We trained VAE for 14630 images of animals and used 1500 images
for validation. We trained over 100 epochs.

Figure 14: Reconstruction of VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 25 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Generation

Figure 15: Generation of new animal face images using VAE

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 26 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curve 1

Here are plot for Train Loss and Validation Loss for VAE on Animal
Face Dataset.

Figure 16: Loss Curves for VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 27 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Loss Curve 2

Here are plot for KL Divergence Loss during training and validation
for VAE on Animal Face Dataset.

Figure 17: KL Divergence Loss for VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 28 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Frechet Inception Distance

We took randomly choosen 2000 images resized to 128x128 from


training set and generated 2000 images using VAE. Our calculated
FID score is 269.98.

Figure 18: Frechet Inception Distance for VAE on Animal Face Dataset

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 29 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key Takeaways

• Importance of Learning Rate.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key Takeaways

• Importance of Learning Rate.


• Importance of Batchsize when training with GPU's.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key Takeaways

• Importance of Learning Rate.


• Importance of Batchsize when training with GPU's.
• More is KL weight, more better the generation, but
reconstruction may suer.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key Takeaways

• Importance of Learning Rate.


• Importance of Batchsize when training with GPU's.
• More is KL weight, more better the generation, but
reconstruction may suer.
• Number of epochs is also important to prevent overtting
sometimes.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Key Takeaways

• Importance of Learning Rate.


• Importance of Batchsize when training with GPU's.
• More is KL weight, more better the generation, but
reconstruction may suer.
• Number of epochs is also important to prevent overtting
sometimes.
• Playing with architecture can also give better results.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Some more results

These are the results that we obtained during experiments but we


lost the parameters onwhich these were generated.

Figure 19: Reconstruction

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 31 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Some more results

Figure 20: Generation


Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 32 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Variational Inference

In VAE, our rst goal is to learn the distribution P(z (i) |x (i) ) we
cannot calculate this distribution directly because terms involved
are neural networks. So, we use Variational Inference to
approximate this distribution.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 33 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Variational Inference

Log likelihood

log P(x (i) ) = ELBO(x; Q) + DKL (Q||Pz|x ) (3)

We want Q to be as close as possible to Pz|x , which means


DKL (Q||Pz|x ) −→ 0.If we maximise ELBO wrt Q, then we are
minimising DKL (Q||Pz|x ).
So our problem reduces to maxq∼Q ELBO(x; q). We will assume all
q belong to same family Q which is Gaussian.

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 34 / 35
Autoencoders Variational Autoencoders Animal Faces Maths

Thank You!

Thank You!
Here are some references:
1 ELBO Surgery: Yet Another Way to Carve Up the Variational
Evidence Lower Bound, https://approximateinference.
org/2016/accepted/HoffmanJohnson2016.pdf
2 CS229 Lecture Notes for VAE, https:
//cs229.stanford.edu/summer2019/cs229-notes8.pdf
3 Tutorial on Variational Autoencoders,
https://arxiv.org/pdf/1606.05908.pdf

Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science

Variational Autoencoders 35 / 35

You might also like