Thanks to visit codestin.com
Credit goes to github.com

Skip to content

PyTorch Implementation of Autoregressive Vision Generative Models with Discrete Latent Variables

License

Notifications You must be signed in to change notification settings

yzGuu830/visual-gpt

Repository files navigation

Visual-GPT

example_generation

This repository provides a PyTorch implementation of Vector-Quantization (VQ)-based Autoregressive Vision Generative Models. Built upon the foundational VQ-GAN architecture, this repo integrates several improved techniques for training visual tokenizers, with a purpose of facilitating replication and research in autoregressive vision generative models.

Results and pretrained model checkpoints of class-conditioned image synthesizers on StanfordDogs, a dataset consisting of 120 dog breeds, are provided below.

Usage

Setup

conda create -n vgpt python=3.9
cd visual-gpt
pip install -r requirements.txt

Synthesize Your Dog Images!

To use the pretrained model, download directly from this link, and run:

bash scripts/sample_vgpt.sh \
    --from_pretrained path/to/vqgan-stfdogs \
    --cls_name maltese samoyed australian_terrier \ # dog classes
    --accept_n 10 \ # choose best 1/accept_n with classifier-rejection
    --temperature 1.3 --top_k 100 --top_p 0.9 # generation params.

This will generate a GIF visualizing the autoregressive decoding process, similar to the one shown at the top of this page.

Training a Visual Tokenizer

The provided visual tokenizers are convolutional autoencoder-based. Implementations include a basic structure from the original VQ-VAE paper and a more advanced VQ-GAN architecture adapted from taming transformers. All configuration parameters are managed under conf/exp.yaml.

To train a visual tokenizer:

python train_tokenizer.py  --exp_name vqgan-stfdogs  --output_path ../outputs  --conf conf/stfdogs.yaml  --wandb visual-gpt

The implemented VectorQuantization layer supports several enhanced dictionary learning techniques. Example programmatic usage:

from vgpt import VectorQuantize

vq_layer = VectorQuantize(
        num_codewords = 1024, # dictionary size K
        embedding_dim = 256, # codeword dim d
        cos_dist = False, # use cosine-distance
        proj_dim = None, # low-dimensional factorization
        random_proj = False, # random projection search
        penalty_weight = 0.25, # penalize non-uniform dists
        pretrain_steps = 5000, # warm-start autoencoders
        init_method = "latent_random" # initialize codebook with pre-trained latents
)
# tokenizing & decoding
code = vq_layer.quantize(z_e)
z_q = vq_layer.dequantize(code)

Training a Visual GPT

After obtaining a fine-grained visual tokenizer, a visual language model can be easily trained for as many generative tasks as possible. To train a class-conditioned image synthesizer based on flattened image tokens:

python train_gpt.py --exp_name vqgan-stfdogs  --output_path ../outputs  --tokenizer_path ../outputs/vqgan-stfdogs  --conf conf/stfdogs.yaml  --wandb visual-gpt

The visual autoregressive model is implemented as CondVisualGPT, which is heavily relied on HuggingFace-Transformers🤗. It handles language model training and sampling very conveniently.

Results

Here are additional generated samples: samples

Acknowledgements

This repo is heavily inspired by these papers:

@misc{esser2021taming,
  title={Taming transformers for high-resolution image synthesis},
  author={Esser, Patrick and Rombach, Robin and Ommer, Bjorn},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={12873--12883},
  year={2021}
}
@misc{huh2023straightening,
  title={Straightening out the straight-through estimator: Overcoming optimization challenges in vector quantized networks},
  author={Huh, Minyoung and Cheung, Brian and Agrawal, Pulkit and Isola, Phillip},
  booktitle={International Conference on Machine Learning},
  pages={14096--14113},
  year={2023},
}

About

PyTorch Implementation of Autoregressive Vision Generative Models with Discrete Latent Variables

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published