Region Segmentation
Readings: Chapter 10: 10.1
Additional Materials Provided
• K-means Clustering (text)
• EM Clustering (paper)
• Graph Partitioning (text)
• Mean-Shift Clustering (paper)
1
Image Segmentation
Image segmentation is the operation of partitioning an
image into a collection of connected sets of pixels.
1. into regions, which usually cover the image
2. into linear structures, such as
- line segments
- curve segments
3. into 2D shapes, such as
- circles
- ellipses
- ribbons (long, symmetric regions)
2
Example: Regions
3
Main Methods of Region
Segmentation
1. Region Growing
2. Split and Merge
3. Clustering
4
Clustering
• There are K clusters C1,…, CK with means m1,…, mK.
• The least-squares error is defined as
K
2
D= || xi - mk || .
k=1 xi Ck
• Out of all possible partitions into K clusters,
choose the one that minimizes D.
Why don’t we just do this?
If we could, would we get meaningful objects?
5
K-Means Clustering
Form K-means clusters from a set of n-dimensional vectors
1. Set ic (iteration count) to 1
2. Choose randomly a set of K means m1(1), …, mK(1).
3. For each vector xi compute D(xi , mk(ic)), k=1,…K
and assign xi to the cluster Cj with nearest mean.
4. Increment ic by 1, update the means to get m 1(ic),…,mK(ic).
5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.
6
K-Means Example 1
7
K-Means Example 2
8
K-Means Example 3
9
K-means Variants
• Different ways to initialize the means
• Different stopping criteria
• Dynamic methods for determining the right
number of clusters (K) for a given image
• The EM Algorithm: a probabilistic
formulation of K-means
10
K-Means
• Boot Step:
– Initialize K clusters: C1, …, CK
Each cluster is represented by its mean mj
• Iteration Step:
– Estimate the cluster for each data point
xi C(xi)
– Re-estimate the cluster parameters
11
K-Means Example
12
K-Means Example
Where do the red points belong?
13
K-means vs. EM
K-means EM
Cluster mean mean, variance,
Representation and weight
Cluster randomly select initialize K
Initialization K means Gaussian
distributions
Expectation assign each point soft-assign each point
to closest mean to each distribution
Maximization compute means compute new params
of current clusters of each distribution
14
Notation
N( , ) is a 1D normal (Gaussian) distribution with
mean and standard deviation (so the
variance is 2.
15
N(, ) is a multivariate Gaussian distribution with
mean and covariance matrix .
What is a covariance matrix?
R G B
R R2 RG RB variance(X): X2 = (xi - )2 (1/N)N)
G GR G2 GB
B BR BG B2
cov(X,Y) = (xi - x)(yi - y) (1/N)N)
16
1. Suppose we have a set of clusters: C1, C2, ... , CK
over a set of data points X = {x1, x2, ... , xN}.
P(Cj) is the probability or weight of cluster Cj.
P(Cj | xi) is the probability of cluster Cj given point xi.
P(xi | Cj) is the probability of point xi belonging to
cluster Cj.
2.Suppose that a cluster Cj is represented by a
Gaussian distribution N(j,j). Then for any point xi:
17
EM: Expectation-Maximization
• Boot Step:
– Initialize K clusters: C1, …, CK
(j, j) and and and P(Cj) for and each and cluster and j.
• Iteration Step:
– Estimate the cluster of each data point
p(C j | xi ) Expectation
– Re-estimate the cluster parameters
( j , j ), p (C j ) Maximization
For and each and cluster and j
18
1-D EM with Gaussian Distributions
• Each cluster Cj is represented by a
Gaussian distribution N(j , j).
• Initialization: For each cluster Cj initialize
its mean j , variance j2, and weight j.
N(1 , 1) N(2 , 2) N(3 , 3)
1 = P(C1) 2 = P(C2) 3 = P(C3)
19
Expectation
• For each point xi and each cluster Cj
compute P(Cj | xi).
• P(Cj | xi) = P(xi | Cj) P(Cj ) /N) P(xi)
• P(xi) = P(xi | Cj) P(Cj)
j
• Where do we get P(xi | Cj) and P(Cj)?
20
1. Use the pdf for a normal distribution:
2. Use j = P(Cj) from the current
parameters of cluster Cj.
21
Maximization
• Having computed p(C | x ) x
j i i
i
j
P(Cj | xi) for each p(C | x ) j i
point xi and each
i
cluster Cj, use them to
compute new mean, j2= j
variance, and weight
for each cluster.
p(C
i
j | xi )
p(C j )
N
22
Multi-Dimensional Expectation Step
for Color Image Segmentation
Input (Known) Input (Estimation) Output
x1={r1, g1, b1} Cluster and Parameters Classification and Results
(1,1), p(C1) and for and C1 p(C1|x1)
x2={r2, g2, b2} (2,2), p(C2) and for and C2
+ p(Cj|x2)
… …
… p(Cj|xi)
xi={ri, gi, bi} (k,k), p(Ck) and for and Ck …
…
p( xi | C j ) p(C j ) p( xi | C j ) p(C j )
p(C j | xi )
p( xi ) p( x | C ) p(C )
i j j
j
23
Multi-dimensional Maximization Step
for Color Image Segmentation
Input (Known) Input (Estimation) Output
x1={r1, g1, b1} Classification and Results Cluster and Parameters
p(C1|x1) (1,1), p(C1) and for and C1
x2={r2, g2, b2} (2,2), p(C2) and for and C2
+ p(Cj|x2)
… …
p(Cj|xi) …
xi={ri, gi, bi} (k,k), p(Ck) and for and Ck
…
…
p(C j | xi ) xi p(C j | xi ) ( xi j ) ( xi j )T p(C
i
j | xi )
j i j i p(C j )
p(C | xi ) N
p(C j | xi )
i i
j
24
Full EM Algorithm
Multi-Dimensional
• Boot Step:
– Initialize K clusters: C1, …, CK
(j, j) and and and P(Cj) for and each and cluster and j.
• Iteration Step:
– Expectation Step
p ( xi | C j ) p(C j ) p( xi | C j ) p(C j )
p(C j | xi )
p ( xi ) p( xi | C j ) p(C j )
j
– Maximization Step
p(C | x ) x
j i i p (C j | xi ) ( xi j ) ( xi j )T p(C j | xi )
i i i
j j p (C j )
p(C | x ) j i p(C j | xi ) N
i i 25
Visualizing EM Clusters
ellipses show one,
two, and three
standard deviations
mean
26
EM Demo
• Demo
http:/N)/N)www.neurosci.aist.go.jp/N)~akaho/N)MixtureEM.html
• Example
http:/N)/N)www-2.cs.cmu.edu/N)~awm/N)tutorials/N)gmm13.pdf
27
EM Applications
• Blobworld: and Image and segmentation and using and
Expectation-Maximization and and and its and application and
to and image and querying
• Yi’s and Generative/Discriminative and Learning and of and
object and classes and in and color and images
28
Blobworld: Sample Results
29
Jianbo Shi’s Graph-Partitioning
• An image is represented by a graph whose nodes
are pixels or small groups of pixels.
• The goal is to partition the vertices into disjoint sets so
that the similarity within each set is high and
across different sets is low.
30
Minimal Cuts
• Let G = (V,E) be a graph. Each edge (u,v) has a weight w(u,v)
that represents the similarity between u and v.
• Graph G can be broken into 2 disjoint graphs with node sets
A and B by removing edges that connect these sets.
• Let cut(A,B) = w(u,v).
uA, vB
• One way to segment G is to find the minimal cut.
31
Cut(A,B)
cut(A,B) = w(u,v)
uA, vB
B
A
w1
w2
32
Normalized Cut
Minimal cut favors cutting off small node groups,
so Shi proposed the normalized cut.
cut(A, B) cut(A,B)
normalized
Ncut(A,B) = ------------- + -------------
cut
asso(A,V) asso(B,V)
asso(A,V) = w(u,t) How much is A connected
uA, tV to the graph as a whole.
33
Example Normalized Cut
A B
2 2 2
2
2 2 2 2
1 4 3 1 2
2 2 3
3 3
Ncut(A,B) = ------- + ------
21 16
34
Shi turned graph cuts into an
eigenvector/N)eigenvalue problem.
• Set up a weighted graph G=(V,E)
– V is the set of (N) pixels
– E is a set of weighted edges (weight wij gives the
similarity between nodes i and j)
– Length N vector d: di is the sum of the weights from
node i to all other nodes
– N x N matrix D: D is a diagonal matrix with d on its
diagonal
– N x N symmetric matrix W: Wij = wij 35
• Let x be a characteristic vector of a set A of nodes
– xi = 1 if node i is in a set A
– xi = -1 otherwise
• Let y be a continuous approximation to x
• Solve the system of equations
(D – W) y = D y
for the eigenvectors y and eigenvalues
• Use the eigenvector y with second smallest
eigenvalue to bipartition the graph (y => x => A)
• If further subdivision is merited, repeat recursively
36
How Shi used the procedure
Shi defined the edge weights w(i,j) by
-||X(i)-X(j)||2 / X
-||F(i)-F(j)||2 / I e if ||X(i)-X(j)||2 < r
w(i,j) = e *
0 otherwise
where X(i) is the spatial location of node i
F(i) is the feature vector for node I
which can be intensity, color, texture, motion…
The formula is set up so that w(i,j) is 0 for nodes that
are too far apart.
37
Examples of See Shi’s Web Page
Shi Clustering http://www.cis.upenn.edu/~jshi/
38
Problems with Graph Cuts
• Need to know when to stop
• Very Slooooow
Problems with EM
• Local minima
• Need to know number of segments
• Need to choose generative model
39
Mean-Shift Clustering
• Simple, like K-means
• But you don’t have to select K
• Statistical method
• Guaranteed to converge to a fixed number
of clusters.
40
Finding Modes in a Histogram
• How Many Modes Are There?
– Easy to see, hard to compute 41
Mean Shift [Comaniciu & Meer]
• Iterative Mode Search
1. Initialize random seed, and window W
2. Calculate center of gravity (the “mean”) of W:
3. Translate the search window to the mean
4. Repeat Step 2 until convergence
42
Numeric Example
Must Use Normalized Histogram!
window W centered at 12
mean shift
x 10 11 12 13 14
H(x) 5 4 3 2 1
N(x) 5/N)15 4/N)15 3/N)15 2/N)15 1/N)15
x N(x) = 10(5/N)15)+11(4/N)15)+12(3/N)15)+13(2/N)15)+14(1/N)15)
= 11.33
43
Mean Shift Approach
– Initialize a window around each point
– See where it shifts—this determines which segment
it’s in
– Multiple points will shift to the same segment
44
Segmentation Algorithm
• First run the mean shift procedure for each
data point x and store its convergence point z.
• Link together all the z’s that are closer than .5
from each other to form clusters
• Assign each point to its cluster
• Eliminate small regions
45
Mean-shift for image segmentation
• Useful to take into account spatial information
– instead of (R, G, B), run in (R, G, B, x, y) space
46
References
– Shi and Malik, “Normalized Cuts and Image
Segmentation,” Proc. CVPR 1997.
– Carson, Belongie, Greenspan and Malik, “Blobworld:
Image Segmentation Using Expectation-Maximization
and its Application to Image Querying,” IEEE PAMI,
Vol 24, No. 8, Aug. 2002.
– Comaniciu and Meer, “Mean shift analysis and
applications,” Proc. ICCV 1999.
47