Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
52 views39 pages

Lecture 4 Diffusion - Models Part I Final

Uploaded by

huukhoadn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views39 pages

Lecture 4 Diffusion - Models Part I Final

Uploaded by

huukhoadn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CAP6412

Advanced Computer Vision


Mubarak Shah
[email protected]
HEC-245
Lecture-4: Diffusion Models

1/23/2023 CAP6412 - Lecture 1 Introduction 1


Diffusion models in vision: A survey
https://arxiv.org/pdf/2209.04747.pdf

Alin Croitoru Vlad Hondru Radu Tudor Ionescu Mubarak Shah


University of Bucharest, University of Bucharest, University of Bucharest, University of Central
Romania Romania Romania Florida, US
[email protected] [email protected] [email protected] [email protected]
Agenda

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Applications
8. Research directions
Outline

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Applicatons
8. Research directions
Motivation
Motivation
Motivation

A hedgehog using a A corgi wearing a red A transparent sculpture- of


calculator. bowtie and a purple a duck made out of glass.
party hat.

A photo of a Corgi dog riding a Pomeranian king Zebras roaming


bike in Times Square. It is wearing with tiger soldiers. in the field.
sunglasses and a beach hat.
Outline

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Research directions
High-level overview
• Diffusion models are probabilistic models used for image generation
• They involve reversing the process of gradually degrading the data
• Consist of two processes:
 The forward process: data is progressively destroyed by adding noise across
multiple time steps
 The reverse process: using a neural network, noise is sequentially removed
to obtain the original data

Standard Gaussian
Data distribution

reverse

forward
High-level overview

• Three categories:

 Denoising Diffusion Probabilistic Models (DDPM)

 Noise Conditioned Score Networks (NCSN)

 Stochastic Differential Equations (SDE)


Outline

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Research directions
Notations
𝑝 𝑥 - data distribution

𝒩 𝑥; 𝜇, 𝜎 ⋅ 𝐼 - Gaussian distribution

Random Variable (image) Mean Vector Covariance matrix. 𝐼 is the identity matrix

𝑥= 𝜇+ 𝜎 ⋅ 𝑧, 𝑧~𝒩(0, 𝐼)

Sample from this distribution


Denoising Diffusion Probabilistic Models (DDPMs)

Forward process
𝑥 𝑥

… …

𝑥 ~𝑝(𝑥 ) 𝑥 ~𝒩(0, 𝐼)
Denoising Diffusion Probabilistic Models (DDPMs)

𝑥 𝑥

… …

𝑥 ~𝑝(𝑥 ) Reverse process 𝑥 ~𝒩(0, 𝐼)


Denoising Diffusion Probabilistic Models (DDPMs)

Forward process (Iterative) The image is


replaced with
noise
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I) 𝛽 ≪ 1 , 𝑡 = 1, 𝑇

… …
𝑥 𝑥 𝑥 𝑥
Denoising Diffusion Probabilistic Models (DDPMs)

Forward process. Ancestral sampling (One Shot) Notations:

𝛽 = 𝛼
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝛽 ⋅ 𝑥 , 1 − 𝛽 I) 𝛼 =1 − 𝛽

… …
𝑥 𝑥 𝑥 𝑥
DDPMs. Properties of

1. 𝛽 ≪ 1 , 𝑡 = 1, 𝑇
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I)

𝑥 is created with a small step modeled by 𝛽

𝑡−1 𝑡

𝑥 comes from region close to 𝑥 ,


therefore we can model with Gaussian

𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝜇 𝑥 ,𝑡 ,Σ 𝑥 ,𝑡 )
DDPMs. Properties of

1. 𝛽 ≪ 1 , 𝑡 = 1, 𝑇

𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I)

𝑡−1 𝑡

?
Less certain where was the 𝑥 , because we could have
reached 𝑥 from many more regions.
DDPMs. Properties of

1. 𝛽 ≪ 1, 𝑡 = 1, 𝑇 ⟹ 2. 𝑇 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
𝑥 𝑖𝑠 𝑝𝑢𝑟𝑒 𝑛𝑜𝑖𝑠𝑒

𝑥 𝑥

𝑇 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
DDPMs. Training objective
Remember that:

𝑥 𝑥 𝑥 𝑥

… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ;𝜇 𝑥 ,𝑡 ,Σ 𝑥 ,𝑡 )
Reverse process

Neural network Approximated by


weights a neural network
DDPMs. Training objective
Simplification:

𝑥 𝑥 𝑥 𝑥

… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝜇 𝑥 , 𝑡 , 𝜎 I)
Reverse process

Neural network Approximated by


Fix the variance instead of learning, and predict/learn the mean weights a neural network
DDPMs. Training objective
UNet-like neural network

𝜇 (𝑥 , 𝑡)

~𝒩 𝑥 , 𝜇 (𝑥 , 𝑡), 𝜎 I

𝑥
U-Net
U-Net
U-Net
Slide from:
Denoising Diffusion-based Generative Modeling:
Foundations and Applications
Karsten Kreis Ruiqi Gao Arash Vahdat
DDPMs. Training Objective
02 Attention and Transformers

Cross Entropy and KL (Kullback-Leibler) divergence

• Entropy: E(P) = - ΣiP(i)logP(i)


• Cross Entropy: C(P) = - ΣiP(i) log Q(i)
• KL divergence: DKL(P || Q) = ΣiP(i)log[P(i)/Q(i)] = ΣiP(i)[logP(i) – logQ(i)]

Slides from Ming Li, University of Waterloo, CS 886 Deep Learning and NLP
DDPMs. Training Objective

min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝(𝑥 |𝑥 )||𝑝 𝑥 ) + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

At each time step t, 𝑝 𝑥 𝑥 is as close


This term can be ignored because 𝑝 𝑥 is 𝒩 0, Ι as possible to the true posterior of the
and does not depend on 𝜃. forward process when conditioned on the
original image.
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

• The KL divergences of 2 gaussians is L2 distance between their means

• The first term measures the reconstruction error and can be addressed
with an independent decoder.

• DDPMs paper introduced two simplifications that led to a much simple


objective that is based on the noise in the image.
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

Notations:
𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼
𝛽 = 𝛼
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 𝛼 =1 − 𝛽
𝛼
1−𝛽
1−𝛽
𝛽 = ⋅𝛽
1−𝛽
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

Tractable posterior: Notations:

𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼 𝛽 = 𝛼

𝛼 =1 − 𝛽
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 1−𝛽
𝛼 𝛽 = ⋅𝛽
1−𝛽 1−𝛽
1 1 − 𝛼 ⟹
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 (𝑥 , 𝑡)
𝛼
1−𝛽

𝛽
⟹ 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 |𝑝 𝑥 𝑥 = 𝔼 ~𝒩( , ) 𝑧 − 𝑧 (𝑥 , 𝑡)
2𝜎 𝛼 (1 − 𝛽 )
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

Tractable posterior: Notations:

𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼 𝛽 = 𝛼

𝛼 =1 − 𝛽
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 1−𝛽
𝛼 𝛽 = ⋅𝛽
1−𝛽 1−𝛽
1 1 − 𝛼 ⟹
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 (𝑥 , 𝑡)
𝛼
1−𝛽
Ignored

𝛽
⟹ 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 |𝑝 𝑥 𝑥 = 𝔼 ~𝒩( , ) 𝑧 − 𝑧 (𝑥 , 𝑡)
2𝜎 𝛼 (1 − 𝛽 )
DDPMs. Training Algorithm

1
min 𝔼 ~ , ~𝒩 , 𝑧 − 𝑧 (𝑥 , 𝑡)
𝑇

Training algorithm:

Repeat 𝛽 = 𝛼
𝑥 ~𝑝 𝑥
𝑡~𝒰 1, … , 𝑇
𝑧 ~𝒩(0, I)
𝑥 = 𝛽 ⋅𝑥 + 1−𝛽 𝑧
𝜃 = 𝜃 − 𝑙𝑟 ⋅ ∇ ℒ
Until convergence
DDPMs. Sampling

𝑥
𝑧 (𝑥 , 𝑡)

• Pass the current noisy image along with t to the neural network

• With the resultant compute the mean of the gaussian distribution


DDPMs. Sampling

𝑥
𝑧 (𝑥 , 𝑡)

Sample the image for the next iteration


𝜇 (𝑥 , 𝑡)

1 1 − 𝛼
~𝒩 𝑥 , 𝑥 − 𝑧 𝑥 ,𝑡 ,𝜎 I
𝛼
1−𝛽

𝑥
Thank You

You might also like