-
Notifications
You must be signed in to change notification settings - Fork 2
Description
BigGAN/StyleGAN work well for unconditional inputs, and for categorical-classified inputs, but for tags or text embeddings, we do not have working GANs. Arfa has established with StyleGAN experiments that doc2vec embeddings of Danbooru faces (n>300k), while encoding the relevant information, yield very poor generalization and StyleGAN seems to learn only a few basic dimensions like hair color despite the rich tag information. He experimented with one-hot encoding of the variables used to construct the cartoonfaces dataset, where the variables are known by construction, and found that despite the excellent visual quality and simple-as-possible one-hot-encoding of the variables, StyleGAN drops modes and levels of variables.
This is consistent with the StackGAN papers which note feeding in text embeddings of descriptive sentences resulted in StackGAN memorizing or not learning well, which they ascribe to the high-dimensional inputs being extremely unique for each image and extremely sparse. Their solution was to essentially add Gaussian noise (in a way respecting the correlation of each element of the embedding, since they are not independent), which they called 'noise augmentation' and explain as 'smoothing' the embeddings by jittering them and making the embedding more of a cloud than a single exact high-dimensional point.
Since StackGAN is the only relatively recent and successful text-to-image GAN I know of, there are not a lot of other suggestions as to what to try. We have a ton of data, but the problem is still occurring. I suggest noise augmentation.
A simple way to implement noise augmentation without messing around with multivariate gaussians would be to take a nonparametric approach: the correlation matrix is surely represented by other datapoints' embedding as well. So to 'noise augment' a given embedding, grab a random embedding from the dataset, and do a weighted-average of the current embedding and the random embedding.