Autoencoders Variational Autoencoders Animal Faces Maths
Variational Autoencoders
A Deep Generative Model
Ayush Raina, Anushka Dassi, Arnav Bhatt
Indian Institute of Science
April 12, 2024
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 1 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
What are Autoencoders ?
1. Autoencoders are neural networks that aim to learn the compact
representation of the input data whose dimension is much smaller
than the input data.
2. It consists of two parts: an encoder (ϕ) and a decoder (θ), where
encoder and decoder are neural networks.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 2 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
What are Autoencoders ?
1. Autoencoders are neural networks that aim to learn the compact
representation of the input data whose dimension is much smaller
than the input data.
2. It consists of two parts: an encoder (ϕ) and a decoder (θ), where
encoder and decoder are neural networks.
Goal
The goal of encoder network is to learn a hidden representation of
the input data, and the goal of decoder network is to reconstruct
the input data from the hidden representation.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 2 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Training Autoencoders
• The training of autoencoders is done by minimizing the
reconstruction error between the input data and the
reconstructed data.
• The loss function used for training autoencoders is Mean
Squared Error (MSE).
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 3 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Training Autoencoders
• The training of autoencoders is done by minimizing the
reconstruction error between the input data and the
reconstructed data.
• The loss function used for training autoencoders is Mean
Squared Error (MSE).
• The loss function is given by:
1X
N
L(ϕ, θ) = ||xi − Decoderθ (Encoderϕ (xi ))||2 (1)
N
i=1
where xi is the input data, Encoderϕ (xi ) is the hidden
representation of xi and Decoderθ (Encoderϕ (xi )) is the
reconstructed data.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 3 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Reconstructions of Autoencoders
Figure 1: Reconstruction of Autoencoders without CNN
Figure 2: Reconstruction of Autoencoders with CNN
In both the cases we trained for over 20 epochs.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 4 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Reconstructions of Autoencoders
Figure 3: Reconstruction of Autoencoders on Animal Face Dataset
This reconstruction was done on grayscale version, we also obtained
similar output on colored version.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 5 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
Can we generate new data with Autoencoders ?
After the training of autoencoders, we can generate new data by
feeding random hidden representation to the decoder network. But
we may simply get noise in the output.
This is because the hidden representation lies in very small
subspace of the input space.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 6 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
Figure 4: Generation of new handwritten digits using Autoencoders
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 7 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
Figure 4: Generation of new handwritten digits using Autoencoders
This is the generation when we fed mean of the hidden
representations generated by the encoder network. Otherwise
output was simply a noise.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 7 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
What should be done ?
If we somehow feed the highly likely hidden representation z , then
we can expect meaningful output.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 8 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
What is highly likely hidden representation ?
In fact we want to sample from P(z (i) |x (i) ) where z (i) is the hidden
representation of x (i) .
Here both encoder and decoder networks are deterministic, which
means for every input data x (i) , the hidden representation z (i) is
xed and vice versa.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 9 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Introduction to VAE
Variational Autoencoders (VAE) have same structure as
autoencoders, but here we learn the distribution of the hidden
representation, rather than learning the xed hidden representation.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 10 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Introduction to VAE
Variational Autoencoders (VAE) have same structure as
autoencoders, but here we learn the distribution of the hidden
representation, rather than learning the xed hidden representation.
2 things to care about
1. We are interested in learning the distribution P(z (i) |x (i) ) so that
we can sample highly likely z (i) for a given x (i) .
2. We are also interested in learning the distribution P(x (i) |z (i) ) so
that we can generate new data by sampling z (i) from P(z (i) |x (i) ).
With above choice, neither encoder nor decoder is deterministic.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 10 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Modelling Assumptions
Assumption 1
z (i) ∼ N(0, Ik×k )
Assumption 2
Posterior distribution P(z (i) |x (i) ) ∼ N(µ(x (i) ), Σ(x (i) )) , where
µ(x (i) ) and Σ(x (i) ) are the functions of x (i) .
Assumption 3
P(x (i) |z (i) ) ∼ N(µ(z (i) ), Σ(z (i) )) , where µ(z (i) ) and Σ(z (i) ) are
the functions of z (i) .
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 11 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Achieving the goals
1. Since we assumed that P(z (i) |x (i) ) ∼ N(µ(x (i) ), Σ(x (i) ), the
goal of encoder network is to learn mean and variance of the
distribution.
2. We also assumed that P(x (i) |z (i) ) ∼ N(µ(z (i) ), Σ(z (i) ), the goal
of decoder network is to learn mean and variance of the distribution.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 12 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Visualizing the VAE
Figure 5: Variational Autoencoder
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 13 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Loss Function in VAE
We we optimize the following MSE Loss + KL Divergence Loss
where KL Divergence Loss is given by:
1X k
(i) (i)
KL(P(z |x )||P(z )) = (µ2 + σj2 − log(σj ) − 1)
(i)
(2)
2 j=1 j
where µj and σj are the mean and variance of the latent space
distribution P(z (i) |x (i) ).In practice we assume that covariance
matrix is diagonal.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 14 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Loss Function in VAE
We we optimize the following MSE Loss + KL Divergence Loss
where KL Divergence Loss is given by:
1X k
(i) (i)
KL(P(z |x )||P(z )) = (µ2 + σj2 − log(σj ) − 1)
(i)
(2)
2 j=1 j
where µj and σj are the mean and variance of the latent space
distribution P(z (i) |x (i) ).In practice we assume that covariance
matrix is diagonal. This can be derived from the expression of KL
Divergence between two Gaussian distributions.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 14 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
VAE's on MNIST dataset
Figure 6: Generation of new handwritten digits using VAE
We can clearly see that VAE has generated better handwritten
digits than autoencoders.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 15 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
VAE's on MNIST dataset
Figure 7: Reconstructions of VAE on MNIST dataset
Reconstructions are also similar to the original data.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 16 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Key observations
1. Autoencoders are deterministic, while VAE's are probabilistic.
2. VAE's are better in generating new data than autoencoders.
3. Autoencoders are better in reconstruction than VAE's.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 17 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
About Dataset
Dataset Animal Faces
Size 512x512
Training Images 14630
Validation Images 1500
Number of Classes 3
For all out experiments we have resized our images to 128x128.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 18 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Training for cats only
We trained VAE for 5153 cat images and used 500 images for
validation. We trained over 40 epochs. Here are the results:
Figure 8: Reconstructions of VAE on Cat dataset
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 19 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
Figure 9: Generation of new cat images using VAE
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 20 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Loss Curves
Figure 10: Loss Curves for VAE on Cat dataset
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 21 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Training for wild animals only
We trained VAE for 4738 images of wild animals and used 500
images for validation. We trained over 70 epochs.
Figure 11: Reconstructions of VAE on Wild Animals
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 22 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
Figure 12: Generation of new wild animal images using VAE
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 23 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Loss Curves
Figure 13: Loss Curves for VAE on Wild Animals
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 24 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Training on full dataset
We trained VAE for 14630 images of animals and used 1500 images
for validation. We trained over 100 epochs.
Figure 14: Reconstruction of VAE on Animal Face Dataset
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 25 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Generation
Figure 15: Generation of new animal face images using VAE
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 26 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Loss Curve 1
Here are plot for Train Loss and Validation Loss for VAE on Animal
Face Dataset.
Figure 16: Loss Curves for VAE on Animal Face Dataset
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 27 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Loss Curve 2
Here are plot for KL Divergence Loss during training and validation
for VAE on Animal Face Dataset.
Figure 17: KL Divergence Loss for VAE on Animal Face Dataset
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 28 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Frechet Inception Distance
We took randomly choosen 2000 images resized to 128x128 from
training set and generated 2000 images using VAE. Our calculated
FID score is 269.98.
Figure 18: Frechet Inception Distance for VAE on Animal Face Dataset
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 29 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Key Takeaways
• Importance of Learning Rate.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Key Takeaways
• Importance of Learning Rate.
• Importance of Batchsize when training with GPU's.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Key Takeaways
• Importance of Learning Rate.
• Importance of Batchsize when training with GPU's.
• More is KL weight, more better the generation, but
reconstruction may suer.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Key Takeaways
• Importance of Learning Rate.
• Importance of Batchsize when training with GPU's.
• More is KL weight, more better the generation, but
reconstruction may suer.
• Number of epochs is also important to prevent overtting
sometimes.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Key Takeaways
• Importance of Learning Rate.
• Importance of Batchsize when training with GPU's.
• More is KL weight, more better the generation, but
reconstruction may suer.
• Number of epochs is also important to prevent overtting
sometimes.
• Playing with architecture can also give better results.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 30 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Some more results
These are the results that we obtained during experiments but we
lost the parameters onwhich these were generated.
Figure 19: Reconstruction
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 31 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Some more results
Figure 20: Generation
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 32 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Variational Inference
In VAE, our rst goal is to learn the distribution P(z (i) |x (i) ) we
cannot calculate this distribution directly because terms involved
are neural networks. So, we use Variational Inference to
approximate this distribution.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 33 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Variational Inference
Log likelihood
log P(x (i) ) = ELBO(x; Q) + DKL (Q||Pz|x ) (3)
We want Q to be as close as possible to Pz|x , which means
DKL (Q||Pz|x ) −→ 0.If we maximise ELBO wrt Q, then we are
minimising DKL (Q||Pz|x ).
So our problem reduces to maxq∼Q ELBO(x; q). We will assume all
q belong to same family Q which is Gaussian.
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 34 / 35
Autoencoders Variational Autoencoders Animal Faces Maths
Thank You!
Thank You!
Here are some references:
1 ELBO Surgery: Yet Another Way to Carve Up the Variational
Evidence Lower Bound, https://approximateinference.
org/2016/accepted/HoffmanJohnson2016.pdf
2 CS229 Lecture Notes for VAE, https:
//cs229.stanford.edu/summer2019/cs229-notes8.pdf
3 Tutorial on Variational Autoencoders,
https://arxiv.org/pdf/1606.05908.pdf
Ayush Raina, Anushka Dassi, Arnav Bhatt Indian Institute of Science
Variational Autoencoders 35 / 35