Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BigGAN/StyleGAN: make conditional training work #10

@gwern

Description

@gwern

BigGAN/StyleGAN work well for unconditional inputs, and for categorical-classified inputs, but for tags or text embeddings, we do not have working GANs. Arfa has established with StyleGAN experiments that doc2vec embeddings of Danbooru faces (n>300k), while encoding the relevant information, yield very poor generalization and StyleGAN seems to learn only a few basic dimensions like hair color despite the rich tag information. He experimented with one-hot encoding of the variables used to construct the cartoonfaces dataset, where the variables are known by construction, and found that despite the excellent visual quality and simple-as-possible one-hot-encoding of the variables, StyleGAN drops modes and levels of variables.

This is consistent with the StackGAN papers which note feeding in text embeddings of descriptive sentences resulted in StackGAN memorizing or not learning well, which they ascribe to the high-dimensional inputs being extremely unique for each image and extremely sparse. Their solution was to essentially add Gaussian noise (in a way respecting the correlation of each element of the embedding, since they are not independent), which they called 'noise augmentation' and explain as 'smoothing' the embeddings by jittering them and making the embedding more of a cloud than a single exact high-dimensional point.

Since StackGAN is the only relatively recent and successful text-to-image GAN I know of, there are not a lot of other suggestions as to what to try. We have a ton of data, but the problem is still occurring. I suggest noise augmentation.

A simple way to implement noise augmentation without messing around with multivariate gaussians would be to take a nonparametric approach: the correlation matrix is surely represented by other datapoints' embedding as well. So to 'noise augment' a given embedding, grab a random embedding from the dataset, and do a weighted-average of the current embedding and the random embedding.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions