Deep Segmentation
Deep Segmentation Networks
1. DeepLab v1, v2, v3
2. U-Nets
Introduction of DeepLab
What is semantic image segmentation?
Partitioning an image into regions of meaningful
objects.
Assign an object category label.
3
Introduction of
DCNNDeepLab
and image segmentation
Select maximal score
DCNN
class
Class prediction scores for
each pixel
What happens in each standard DCNN layer?
Striding
Pooling
4
Introduction
DCNN and image segmentation
Pooling advantages:
Invariance to small translations of the input.
Helps avoid overfitting.
Computational efficiency.
Striding advantages:
Fewer applications of the filter.
Smaller output size. 5
Introduction
DCNN and image segmentation
What are the disadvantages for semantic
segmentation?
xDown-sampling causes loss of information.
xThe input invariance harms the pixel-perfect
accuracy.
DeepLab address those issues by:
Atrous convolution (‘Holes’ algorithm).
6
CRFs (Conditional Random Fields).
Up-Sampling
Addressing the reduced resolution
problem
Possible solution:
‘deconvolutional’ layers (backwards convolution).
x Additional memory and computational time.
x Learning additional parameters.
Suggested solution:
Atrous (‘Holes’) convolution
7
Deeplabv2
Atrous (‘Holes’)
Algorithm
Remove the down-sampling from the last pooling layers.
Up-sample the original filter by a factor of the strides:
Atrous convolution for 1-D signal:
Introduce zeros between
filter values
Note: standard convolution is a special case for rate r=1.
Chen, Liang-Chieh, et al. " 12
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." arXiv preprint
Atrous (‘Holes’)
Algorithm
Standard convolution
Atrous
convolution
Chen, Liang-Chieh, et al. " 13
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." arXiv preprint
Atrous (‘Holes’)
Algorithm
Filters field-of-view
Small field-of-view → accurate localization
Large field-of-view → context assimilation
‘Holes’: Introduce zeros between filter values.
Effective filter size increases (enlarge the field-of-view of filter):
However, we take into account only the non-zero filter values:
Number of filter parameters is the same.
Number of operations per position is the same.
Chen, Liang-Chieh, et al. " 14
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." arXiv preprint
Atrous (‘Holes’)
Algorithm Original
filter
Standard convolution
Padded
filter
Atrous
convolution
Chen, Liang-Chieh, et al. " 15
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." arXiv preprint
Boundary recovery
DCNN trade-off:
Classification accuracy ↔ Localization accuracy
DCNN score maps successfully predict classification and rough
position.
x Less effective for exact outline.
Chen, Liang-Chieh, et al. " 16
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." arXiv preprint
Boundary recovery
Possible solution: super-pixel representation.
Suggested Solution: fully connected CRFs.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “
Semantic image segmentation with deep convolutional nets and fully connected CRFs,” in ICLR, 2015. 17
https://www.researchgate.net/figure/225069465_fig1_Fig-1-Images-segmented-using-SLIC-into-superpixels-of-size-64-256-and-1024-pixels
Conditional Random
Problem statement
Fields
X - Random field of input observations (images) of size N.
L l1 ,..., lM - Set of labels.
Y - Random field of pixel labels.
X j - color vector of pixel j.
Y j - label assigned to pixel j.
CRFs are usually used to model connections between different images.
Here we use them to model connection between image pixels!
P. Krahenbuhl and V. Koltun, “Efficient inference in fully connected CRFs with Gaussian edge potentials,” in NIPS, 2011. 23
Probabilistic Graphical
Graphical ModelModels
Factorization - a distribution over many variables
represented as a product of local functions, each
depends on a smaller subset of variables.
1
p x , y Z a x ,y
aF
N a N ( a )
24
C. Sutton and A. McCallum, “An introduction to Conditional Random Fields”, Foundations and Trends in Machine Learning, vol. 4, No. 4 (2011) 267–373
Probabilistic Graphical
Models
Undirected vs. Directed
G(V, F, E)
Undirected Directed
4
p y1 , y2 , y3 y1 , y2 y2 , y3 y1 , y3
1 1 1
p y x p y p xk y
k 1
25
C. Sutton and A. McCallum, “An introduction to Conditional Random Fields”, Foundations and Trends in Machine Learning, vol. 4, No. 4 (2011) 267–373
Conditional Random
Fully connected CRFs
Definition: Fields
A
1
P Y X a Ya | X
Z X a 1
Z(X) - is an input-dependent normalization factor.
Factorization (energy function):
N
E y | X i yi | X i , j yi , y j | X
i 1 i j
y - is the label assignment for pixels.
P. Krahenbuhl and V. Koltun, “Efficient inference in fully connected CRFs with Gaussian edge potentials,” in NIPS, 2011. 27
C. Sutton and A. McCallum, “An introduction to Conditional Random Fields”, Foundations and Trends in Machine Learning, vol. 4, No. 4 (2011) 267–373
Conditional Random
Potential functions in our case
Fields
i i
y | X log p y | X i
- is the label assignment probability for pixel i computed by
DCNN.
s s 2 x x
2
s s 2
i , j yi , y j | X 1 yi y j 1 exp 2 exp i
i j i j j
2 a 2
2 b
2
2
2
‘bilateral ’ kernel smoothness kernel
- position of pixel i.
- intensity (color) vector of pixel i.
- learned parameters (weights).
Chen, Liang-Chieh, et al. -"DeepLab:
hyperSemantic
parameters (what
Image Segmentation withis considered
Deep “near”
Convolutional Nets, / “similar”).
Atrous Convolution,
28
and Fully Connected
CRFs." arXiv preprint arXiv:1606.00915 (2016).
Conditional Random
Potential functions in our case
Fields
s s 2 2
xi x j s s 2
i , j yi , y j | X 1 yi y j 1 exp 2 exp i
i j j
2 a 2
2 b
2
2
2
‘bilateral ’ kernel smoothness kernel
Pixels “nearness” Pixels color similarity
Bilateral kernel – nearby pixels with similar color are likely to be in the
same class.
- what is considered “near” / “similar”).
29
Chen, Liang-Chieh, et al. "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected
CRFs." arXiv preprint arXiv:1606.00915 (2016).
Conditional Random
Potential functions in our case
Fields
s s 2 2
xi x j s s 2
i , j yi , y j | X 1 yi y j 1 exp 2 exp i
i j j
2 a 2
2 b
2
2
2
‘bilateral ’ kernel smoothness kernel
– uniform penalty for nearby pixels with different labels.
x Insensitive to compatibility between labels!
30
P. Krahenbuhl and V. Koltun, “Efficient inference in fully connected CRFs with Gaussian edge potentials,” in NIPS, 2011.
Boundary recovery
Score map
Belief map
31
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully
connected CRFs,” in ICLR, 2015.
DeepLab
Group:
CCVL (Center for Cognition, Vision, and Learning).
Basis networks (pre-trained for ImageNet):
VGG-16 (Oxford Visual Geometry Group, ILSVRC 2014
1st).
ResNet-101 (Microsoft Research Asia, ILSVRC 2015 1 ).
st
Code: https://bitbucket.org/deeplab/deeplab-public/
32
U-Net
What does a U-Net do?
Learns Segmentation
Input Image Output Segmentation
Map
U-Net Architecture
“Contraction”
Phase
- Increases field of
view
- Lose Spatial
Information
Ronneberger et al. (2015) U-net
Architecture
U-Net Architecture
“Expansion”
- PhaseHigh Resolution
Create
Mapping
Ronneberger et al. (2015) U-net
Architecture
U-Net Architecture
Concatenate with high-
resolution feature maps from
the Contraction Phase
Ronneberger et al. (2015) U-net
Architecture
U-Net Summary
• Contraction Phase
– Reduce spatial dimension, but increases the “what.”
• Expansion Phase
– Recovers object details and the dimensions, which is the “where.”
• Concatenating feature maps from the Contraction phase helps the
Expansion phase with recovering the “where” information.
Author Results
Ronneberger et al. (2015) ISBI cell
tracking challenge