Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views23 pages

Diffusion

The document discusses diffusion models in machine learning, particularly focusing on generative AI techniques such as Noise Conditional Score Networks (NCSN) and Denoising Diffusion Probabilistic Models (DDPM). It outlines the framework for score-based generative modeling, including score matching and Langevin dynamics, and addresses challenges in score estimation and sampling. The document also references foundational concepts in mathematics necessary for understanding these models.

Uploaded by

Trần Khiêm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views23 pages

Diffusion

The document discusses diffusion models in machine learning, particularly focusing on generative AI techniques such as Noise Conditional Score Networks (NCSN) and Denoising Diffusion Probabilistic Models (DDPM). It outlines the framework for score-based generative modeling, including score matching and Langevin dynamics, and addresses challenges in score estimation and sampling. The document also references foundational concepts in mathematics necessary for understanding these models.

Uploaded by

Trần Khiêm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction NCSN DDPM References

Diffusion model
for machine learning

Tran Trong Khiem

AI lab tranning

2024/08/01

Tran Trong Khiem Diffusion model 1 / 23


Introduction NCSN DDPM References

1 Introduction

2 NCSN

3 DDPM

4 References

Tran Trong Khiem Diffusion model 2 / 23


Introduction NCSN DDPM References

Introduction

Math for machine learning :


• Complete Foundation chapter in Probabilistic Machine
Learning[1]
• Probability and Statistics.
• Linear Algebra.
• Optimazation.
Generative AI:
• Gan
• VAE
• Flow-base
• Diffusion model

Tran Trong Khiem Diffusion model 3 / 23


Introduction NCSN DDPM References

Generative AI

Existing generative modeling techniques can largely be grouped into


two categories based on how they represent probability distribu-
tions.
1 likelihood-based models: which directly learn the
distribution’s probability density (or mass) function via
(approximate) maximum likelihood.(VAEs, EBMs, ...)
• Cons: require strong restrictions on the model architecture to
ensure a tractable normalizing constant for likelihood
computation.
2 implicit generative models: where the probability distribution
is implicitly represented by a model of its sampling
process.(Gan,...)
• Cons: unstable and can lead to model collapse.

Diffusion model introduces another way to represent probability dis-


tributions that circumvent several of these limitations.

Tran Trong Khiem Diffusion model 4 / 23


Introduction NCSN DDPM References

Diffusion model
The key idea is to model the gradient of the log probability density
function, score function.
• score-based models are not required to have a tractable
normalizing constant, and can be directly learned by score
matching.Better than GAN in image generation.
Denote :
• The dataset consists of i.i.d. samples {xi ∈ RD }Ni=1 from an
unknown data distribution pdata (x).
• The score of a probability density p(x) is defined as ∇x log p(x).
• The score network sθ : RD → RD , which will be trained to
approximate the score of pdata (x).
The framework of score-based generative modeling:
1 score matching
2 Langevin dynamics.
Tran Trong Khiem Diffusion model 5 / 23
Introduction NCSN DDPM References

Framework of score-based generative modeling


Score matching :
• train a score network sθ (x) to estimate ∇x log pdata (x) without
training a model to estimate pdata (x)
Langevin dynamics
• produce samples from a probability density p(x) using only the
score function ∇x log pdata (x).

Figure 1: Score-based generative modeling with score matching + Langevin


dynamics.
Tran Trong Khiem Diffusion model 6 / 23
Introduction NCSN DDPM References

Score matching for score estimation

Goal: train a score network sθ (x) to estimate ∇x log pdata (x).


The objective minimizes :
h i
2
Epdata ∥sθ (x) − ∇x log pdata (x)∥2

which can be shown equivalent to the following up to a constant :


 
1 2
Epdata (x) tr(∇x sθ (x)) + ∥sθ (x)∥2
2

Problem: Score matching is not scalable to deep networks and high-


dimensional data due to the computation of tr(∇x sθ (x)).
Solution: There are two popular methods for large scale score match-
ing.
1 Denoising score matching
2 Sliced score matching

Tran Trong Khiem Diffusion model 7 / 23


Introduction NCSN DDPM References

Score matching for score estimation(.cnt)


Denoising score matching:
• completely circumvents tr(∇x sθ (x)).
• perturbs the data point x with a prespecified noise qσ (x̃ | x).
• employs score matching to estimate the score of the perturbed
data. h i
2
Eqσ (x̃|x)pdata (x) ∥sθ (x̃) − ∇x̃ log qσ (x̃ | x)∥2

• However, s∗θ (x) = ∇x log qσ (x) ≈ ∇x log pdata (x) is true only when
the noise is small enough such that qσ (x) ≈ pdata (x).
Sliced score matching:
• uses random projections to approximate tr(∇x sθ (x)).
• The objective is:
1 h
2
i
Epv Epdata v∇x sθ (x)v + ∥sθ (x)∥2
2
• pv is a simple distribution of random vectors.
Tran Trong Khiem Diffusion model 8 / 23
Introduction NCSN DDPM References

Sampling with Langevin dynamics

Goal: produce samples from a probability density p(x) using only the
score function ∇x log p(x).
• Given a fixed step size ϵ > 0, and an initial value x̃0 ∼ π(x)
• π is a prior distribution.
• Langevin method recursively computes the following :
ϵ
x̃t = x̃t−1 + ∇x log p(x̃t−1 ) + ϵzt ,
2

• zt ∼ N (0, I)
• The distribution of x̃T equals p(x) when ϵ → 0 and T → ∞,
• In practice, ϵ is small and T is large.

Tran Trong Khiem Diffusion model 9 / 23


Introduction NCSN DDPM References

Challenges of score-based generative modeling


Inaccurate score estimation with score matching:
• In score matching, we minimize :
h i Z h i
2 2
Epdata ∥sθ (x) − ∇x log pdata (x)∥2 = p(x) ∥sθ (x) − ∇x log pdata (x)∥2 dx

• Since square error weighted by p(x) , they are largely ignored in


low density regions where p(x) is small.

Figure 2: Estimated scores are only accurate in high density regions

Tran Trong Khiem Diffusion model 10 / 23


Introduction NCSN DDPM References

How to bypass the inaccurate score estimation in regions of


low data density?

Observation : perturbing data with random Gaussian noise makes the


data distribution more amenable to score-based generative modeling.
• large Gaussian noise has the effect of filling low density regions
in the original distribution.
Upon intuition is the key idea for Noise Conditional Score Networks(NCSN):

1 perturbing the data using various levels of noise.


2 simultaneously estimating scores corresponding to all noise levels
by training a single conditional score network.

Tran Trong Khiem Diffusion model 11 / 23


Introduction NCSN DDPM References

1 Introduction

2 NCSN

3 DDPM

4 References

Tran Trong Khiem Diffusion model 12 / 23


Introduction NCSN DDPM References

Noise Conditional Score Networks


Problem : How to choose an appropriate noise scale for the perturba-
tion process?
• Larger noise over-corrupts the data and alters it significantly from
the original distribution.
• Smaller noise, on the other hand, causes less corruption of the
original data.
Solution: Use multiple scales of noise perturbations simultaneously.
Denote:
• {σi }Li=1 be a positive sequence geometric decending sequence.
• qσ (x) = pdata (t)N (x | t, σ 2 I) dt the perturbed data distribution.
R

• sθ (x, σ) is a Noise Conditional Score Network (NCSN).


• train model to jointly estimate the scores of all perturbed data
distributions :
∀σ ∈ {σi }Li=1 : sθ (x, σ) ≈ ∇x log qσ (x)
Tran Trong Khiem Diffusion model 13 / 23
Introduction NCSN DDPM References

Learning NCSNs via score matching

Adapt denoising score matching for learning NCSNs.


• choose the noise distribution to be qσ (x̃ | x) = N (x̃ | x, σ 2 I)
• therefore ∇x̃ log qσ (x̃ | x) = − x̃−x
σ2
• For a given σ, the denoising score matching objective is :
" #
2
1 x̃ − x
L(θ; σ) = Epdata (x) Ex̃∼N (x,σ2 I) sθ (x̃, σ) + .
2 σ2 2

• We combine for all σ ∈ {σi }Li=1 to get one unified objective :

L
1X
L(θ; {σi }Li=1 ) = λ(σi )L(θ; σi )
L
i=1

Tran Trong Khiem Diffusion model 14 / 23


Introduction NCSN DDPM References

NCSN inference via annealed Langevin dynamics


• propose a sampling approach— annealed Langevin dynamics

Figure 3: Annealed Langevin dynamics.


Tran Trong Khiem Diffusion model 15 / 23
Introduction NCSN DDPM References

1 Introduction

2 NCSN

3 DDPM

4 References

Tran Trong Khiem Diffusion model 16 / 23


Introduction NCSN DDPM References

Denoising Diffusion Probabilistic Models

Figure 4: Diffusion model

Forward diffusion process:


• add small amount of Gaussian noise to the sample in T
• producing a sequence of noisy samples x1 , x2 · · · xT
• converts any complex data distribution into a simple, tractable,
distribution.
Reverse diffusion process:
• Learn a reveral of forward diffusion process.

Tran Trong Khiem Diffusion model 17 / 23


Introduction NCSN DDPM References

Foward process

Gradually adds Gaussian noise to the data according to a variance


schedule β1 , . . . , βT :
T
Y p
q(x1:T | x0 ) := q(xt | xt−1 ), q(xt | xt−1 ) := N (xt ; 1 − βt xt−1 , βt I)
t=1

Nice property: We can sample xt at timestep t as :


√ p √ p
xt = αt xt−1 + 1 − αt ϵt = ᾱt x0 + 1 − ᾱt ϵ

• ϵt ∼ N (0, I)
• ᾱt = ts=1 αt and αt = 1 − βt
Q

• Thus we have : q(xt |x0 ) = N (xt , ᾱt x0 , (1 − ᾱt I))

Tran Trong Khiem Diffusion model 18 / 23


Introduction NCSN DDPM References

Reverse diffusion process

Goal: Learn to reverse the forward process and sample from q(xt−1 |xt ).

• Use pθ (xt−1 |xt ) to approximate q(xt−1 |xt ).


• The reverse conditional probability is tractable when conditioned
on x0 :
q(xt−1 |xt , x0 ) = N (xt−1 , µ̃(xt , x0 ), β̃t I)

√ 
• β̃t = 1−at−1
¯ √  1−ᾱ  ᾱt−1 βt
1−a¯t
βt and µ̃(xt , x0 ) = αt 1−ᾱt−1
t
xt + 1−ᾱt
x0

Tran Trong Khiem Diffusion model 19 / 23


Introduction NCSN DDPM References

Reverse diffusion process(.cnt)

Training is performed by optimizing the usual variational bound on


negative log likelihood:
 
pθ (x0:T )
Eq [− log pθ (x0 )] ≤ Eq − log
q(x1:T | x0 )
T
" #
X pθ (xt−1 | xt )
= Eq − log p(xT ) − log =: L.
q(xt | xt−1 )
t=1

Loss function can rewrite as :


" #
X
Eq DKL (q(xT |x0 )||p(xT )) + DKL (q(xt−1 |xt , x0 )||pθ (xt−1 |xt )) − log pθ (x0 |x1 )
t>1
(1)
Label each component in the variational lower bound loss separately:
• LVLB = Tt=0 Lt
P

Tran Trong Khiem Diffusion model 20 / 23


Introduction NCSN DDPM References

Reverse diffusion process(.cnt)

The loss term Lt is parameterized and simplified to minimize :


h √ i
Lsimple
p
t = Et∼[1,T],x0 ,ϵt ||ϵt − ϵθ ( ᾱt x0 + 1 − ᾱt ϵt t)||2

Figure 5: Traing process.

Tran Trong Khiem Diffusion model 21 / 23


Introduction NCSN DDPM References

1 Introduction

2 NCSN

3 DDPM

4 References

Tran Trong Khiem Diffusion model 22 / 23


Introduction NCSN DDPM References

References

1 Weng, Lilian. (Jul 2021). What are diffusion models? Lil’Log.


2 Jonathan Ho,Ajay Jain,Pieter Abbeel, Denoising Diffusion
Probabilistic Models
3 Yang Song,Stefano Ermon, Generative Modeling by Estimating
Gradients of the Data Distribution
4 Generative Modeling by Estimating Gradients of the Data
Distribution

Tran Trong Khiem Diffusion model 23 / 23

You might also like