12/15/24, 3:04 PM about:blank
Guide to Choosing a Generative AI Model Type
Types of generative AI models
Model Key features Applications
1. Two competing neural networks: generator and discriminator. 1. Image generation: faces, landscapes, objects
2. The generator learns to create realistic data, while the discriminator 2. Text generation: poems, code, scripts
Generative adversarial learns to distinguish real from fake. 3. Video generation: realistic videos, animation
3. The adversarial training process continuously improves both 4. Drug discovery: generate molecules with
networks (GANs)
networks. intended properties
4. Can be challenging to train and achieve stable results. 5. Music generation: composing new songs
1. Image compression: efficiently stores and
transmits images
1. Encode input data into a lower-dimensional latent space 2. Anomaly detection: identify unusual data
Variational autoencoders 2. Learn a probability distribution over the latent space points
3. Decode samples from the latent space to generate new data points 3. Dimensionality reduction: compress high-
(VAEs)
4. Focuses on learning a meaningful representation of the data dimensional data
4. Text summarization: generate concise
summaries of text documents
1. Text generation: realistic and coherent text
sequences
1. Generate data point by point, conditioned on previously generated
2. Music generation: generating music that
points
follows genre and style
2. Use recurrent neural networks (RNNs) or transformers to capture
Autoregressive models 3. Time series forecasting: predicting future
long-term dependencies
values of a time series
3. Can be computationally expensive for long sequences
4. Image inpainting: filling in missing parts of an
image
1. Image generation: high-quality and diverse
1. Start with a simple noise and gradually "de-noise" it into realistic images
data 2. Text generation: coherent and grammatically
2. Use a U-Net architecture with skip connections to preserve correct text
Diffusion models
information 3. Audio generation: realistic and musical audio
3. Can be more stable and easier to train than GANs, but often slower 4. Inpainting and denoising: improving the
quality of images or audio
1. Image generation: realistic and diverse images
1. Transform a simple distribution (Gaussian) into a complex one 2. Density estimation: modeling the probability
using invertible transformations distribution of data
2. Learn the parameters of these transformations from the data 3. Dimensionality reduction: compress high-
Flow-based models
3. Can be efficient and accurate for high-dimensional data, but dimensional data
training can be challenging 4. Anomaly detection: identify unusual data
points
Comparison of models on different considerations
Feature GANs VAEs Autoregressive models Diffusion models Flow-based models
Images, continuous
Data type Images, text, audio Images, text, continuous data Images, text, sequences Images, text
data
High-fidelity generation, Encoding/decoding, Sequence generation, text- Image generation, Image generation,
Task objective
data augmentation representation learning to-image translation editing, inpainting conditional generation
Quality of High-fidelity,
High-fidelity, diverse Often blurry, less realistic Sharp, high-resolution High-fidelity, diverse
samples controllable
Control over
Limited Moderate High Moderate High
generation
Training
High Moderate High Moderate High
complexity
Interpretability Low Moderate High Moderate Low
Author(s)
Abhishek Gagneja
about:blank 1/1