BigGAN: non-normal latent space (binomial mixture?)

Feature: experiment with _z_ which are a mix of normals, censored normals, binomials, and categoricals.

Typically, in almost all GANs, the original _z_ random noise is just a bunch of Gaussian variables (N(0,1)). This is chosen as a sheer default (when in doubt, a random variable is just a Gaussian), but there is no reason the _z_ couldn't be something totally different. StyleGAN, for example, finds that it's important to transform _z_ into _w_ using no less than 8 layers (#26), which suggests that the standard _z_ is... not great, and does affect the results quite a bit (both the quality of generated samples, and also control of the model for editing, everyone finds that edits on _w_ work a lot better than _z_ in StyleGAN). The _w_ is learning to transform the Gaussians into some distribution more appropriate - and potentially much rougher and spikier and more discrete?

The intuition here is that using just normals forces the GAN to find 128 or 512 or whatever 'factors' which can vary smoothly; but we want a GAN to learn things like 'wearing glasses', which are inherently discrete: a face either is or is not wearing glasses, there's no such thing as wearing +0.712σ glasses, or wearing −0.12σ of glasses-ness, even though that's more or less what using only normals is forcing the GAN to do. So, you get weird things where face interpolations look like raccoons halfway through as you force the GAN to generate partial-glass-ness. If the GAN had access to a binary variable, it could assign that to 'glasses' and simply snap between glasses/no-glasses while interpolating on the latent space without any nonsensical intermediates being forced by smoothness. (Since the G isn't being forced to generate artifactual images which D can easily detect & penalize, it may also have a easier time learning.)

mooch's [BigGAN](https://arxiv.org/pdf/1809.11096.pdf#Appendix.1.E) paper experiments with the latent space to offer something better than a giant _w_ (which we've thus far found a little hard to stabilize inside BigGAN training). In particular, he finds that binary 0/1 (Bernouilli) variables and [rectified (censored) normals](https://en.wikipedia.org/wiki/Rectified_Gaussian_distribution) perform surprisingly better:

![xwd-159128010693015](https://user-images.githubusercontent.com/352559/83774435-b0631000-a653-11ea-8e92-5887e50ad217.png)

A mix of Gaussians and binary variables, inspired by [InfoGAN](https://arxiv.org/abs/1606.03657#openai "'InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets', Chen et al 2016") (where the _z_ is the usual Gaussian provided for randomness & modeling nuisance variables, and _c_ is a vector of binary variables which are the human-meaningful latent variables one hopes the GAN will automatically learn), also performs well.

mooch did not thoroughly explore this space because he wanted to use truncation, so there's potential for getting improvements here. (We could, for example, use a mix of Bernouillis and censored normals, and simply only apply the truncation trick to the censored normals.)

One suggestion would be that we could expand the latent _z_ a lot by tacking on an additional few hundred censored normals or binomials in addition to the existing normals, feeding a mix into each BigGAN block.

Simply modifying the _z_ code should be easy. `compare_gan` already provides a `z_generator` abstraction which lets us pass in `distribution_fn=tf.random.uniform` or `distribution_fn=tf.random.normal` etc. [`tf.random`](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/random) supports categorical / Bernouilli, and normals (but not, it seems, censored normals, only truncated normals, which are different, but censored is easy: `if x > I then 0`). So we could define a distribution which, say, returns the first half of censored normals and a second half of Bernouillis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigGAN: non-normal latent space (binomial mixture?) #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BigGAN: non-normal latent space (binomial mixture?) #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions