0% found this document useful (0 votes)

98 views13 pages

DeepPrimitive: Layered Image Decomposition

This document describes a deep learning framework called DeepPrimitive that can decompose images into layers of geometric primitives. The framework modifies the YOLO network to detect primitives and regress their parameters in each image layer separately. An RNN is also used to predict the control points of spline curves. The proposed layered detection model is shown to have higher accuracy than traditional methods and other learning approaches for image decomposition by primitive detection.

Uploaded by

Hou Bou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views13 pages

DeepPrimitive: Layered Image Decomposition

Uploaded by

Hou Bou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Computational Visual Media

https://doi.org/10.1007/s41095-018-0128-6 Vol. 4, No. 4, December 2018, 385–397

Research Article

DeepPrimitive: Image decomposition by layered primitive detection

Jiahui Huang1 ( ), Jun Gao2 , Vignesh Ganapathi-Subramanian3 , Hao Su4 , Yin Liu5 ,
Chengcheng Tang3 , and Leonidas J. Guibas3

c The Author(s) 2018. This article is published with open access at Springerlink.com

Abstract The perception of the visual world through 1 Introduction

basic building blocks, such as cubes, spheres, and cones,
The computer vision community has been interested
gives human beings a parsimonious understanding of
the visual world. Thus, efforts to find primitive-based
in performing detection tasks on images for a long
geometric interpretations of visual data date back to time. The success of object detection techniques has
1970s studies of visual media. However, due to the been a shot-in-the-arm for better image understand-
difficulty of primitive fitting in the pre-deep learning ing. The potent combination of deep learning
age, this research approach faded from the main stage, techniques with traditional techniques [1, 2] has
and the vision community turned primarily to semantic yielded state-of-the-art techniques which focus on
image understanding. In this paper, we revisit the detecting objects in an image through bounding
classical problem of building geometric interpretations box proposals. While this works well for tasks that
of images, using supervised deep learning tools. We require strong object localization, other applications
build a framework to detect primitives from images in in robotics and autonomic systems require a more
a layered manner by modifying the YOLO network; detailed understanding of the objects in the image.
an RNN with a novel loss function is then used Thus, another well-studied task in visual media
to equip this network with the capability to predict processing is that of instance segmentation, where
primitives with a variable number of parameters. We a per-pixel class label is assigned to an input image.
compare our pipeline to traditional and other baseline Such dense labeling schemes are too redundant, and
learning methods, demonstrating that our layered an intermediate representation needs to be developed.
detection model has higher accuracy and performs
Understanding images or shapes in terms of basic
better reconstruction.
primitives is a very natural human abstraction. The
Keywords layered image decomposition; primitive parsimonious nature of primitive-based descriptions,
detection; biologically inspired vision; deep especially when the task at hand does not require
learning fine-grained knowledge of the image, makes them easy
to use and a good choice. This has been explored
extensively in the realms of both computer vision
1 Tsinghua University, Beijing, 100084, China. E-mail:
[email protected] ( ).
and graphics. Various traditional approaches exist
2 Computer Science Department, University of Toronto, for modeling images and objects, such as blocks
Toronto, M5S2E4, Canada. E-mail: [email protected]. world [3], generalized cylinders [4], and geons [5].
3 Stanford University, Stanford, 94305, United States. While primitive-based modeling generally uses classical
E-mail: V. Ganapathi-Subramanian, [email protected]; techniques, using machine learning techniques to
C. Tang, [email protected]; L. J. Guibas, extract these primitives can help us to attack more
[email protected].
complex images, with multiple layers of information in
4 University of California San Diego, La Jolla, 92093,
United States. E-mail: [email protected].
them. Basic primitive elements such as rectangles,
5 University of Wisconsin-Madison, Madison, 53715, United circles, triangles, and spline curves are usually
States. E-mail: [email protected]. the building blocks of objects in images, and in
Manuscript received: 2018-11-30; accepted: 2018-12-03 combination, provide simple, yet extremely informative

385
386 J. Huang, J. Gao, V. Ganapathi-Subramanian, et al.

representations of complex images. Labeling image with a variable number of unknowns. Then, we
pixels with high-level primitive information also aids propose a layered architecture in Section 4, which can
in vectorizing rasterized images. learn to separate different information layers of the
Complex images have multiple layers of information image and regress parameters in each layer separately.
embedded in them. It is shown in Ref. [6], that In Section 6, we give experiments used to evaluate
human analysis of an image is always performed in the performance of our network against existing
a top–down manner. For example, when given an traditional state-of-the-art techniques, and in Section
image of a room, the biggest objects such as desks, 7, we show how this framework could be applied
beds, chairs, etc., are observed. Then the focus shifts to image editing and recognition by components.
to specific objects, e.g., objects on the desk such We also discuss the limitations of our framework.
as books and monitor; this analysis is performed Finally, in Section 9, we attempt to envisage how
recursively. When analyzing an image of a window, the framework provided in this work would help
humans tend to focus on the border of the window to solve the important problem of primitive-based
first; the inner structure within the window and representations, which has applications that lie at the
decorations are considered later. However, original intersection of vision, AI, and robotics.
object detection networks neglect this layered search To sum up, our contributions in this paper include:
and treat objects from different information layers • A framework based on the YOLOv2 network that
the same. Layered detection has added value when enables class-wise parameter regression for different
there are internal occlusions in the image, which make primitives.
traditional object detection more difficult to perform. • An RNN model to estimate a sequence of a variable
In this work, we attempt to generate a deep network number of control points representing a closed
that separates multiple information layers as in Fig. 1, spline curve in a single 2D image.
and is able to detect the positions of the primitives in • A layered primitive detection model to extract
each layer as well as estimating their parameters (e.g., relationship information from an image.
the width, height, and orientation of a rectangle or
the number and positions of control points of a spline). 2 Related work
The proposed method is shown to be more accurate
than traditional methods and other learning-based Our task of decomposing an input image into layers
approaches. of correlated and possibly overlapping geometric
This paper is organized as follows. We consider primitives is inherently linked to three categories
related work in Section 2, and provide an analysis of problems, which have been treated and studied
of the novelty of our work. Then, in Section 3, independently in the traditional setting. Object
we propose a framework based on the traditional detection and high-level vision, regression and
YOLOv2 network [2], to provide parameters that are reconstruction of geometric components such as
fully interpretable and high-level. We also tackle splines and primitives, and finally, understanding
the problem of regressing parameters for primitives relationships and layout of objects and entities are
problems that provide information at different scales,
all of great importance to the computer vision
and graphics communities. After considering these
three categories of applications, we conclude the
discussion of related work with relevant machine
learning methodologies, with a focus on recurrent
neural networks.
2.1 Object detection and high-level vision
Among the traditional model-driven approaches
Fig. 1 Motivation: given an image composed of abstract shapes, our to object detection, the generalized Hough
framework can decompose overlapping primitives into multiple layers transform [7] is a classical technique applicable to
and estimate their parameters.
detecting particular classes of shapes up to rigid
DeepPrimitive: Image decomposition by layered primitive detection 387

transformations. Variability of shapes as well as Liu et al. [9] attempt to use feature hierarchies
input nuances are tackled by deep-learning based and detect objects based on different feature maps.
techniques; faster-RCNN [8] utilizes region proposal Lin et al. [19] further improve this elegant idea
networks (RPN) to locate objects and fast-RCNN by adding top–down convolutional layers and skip
to determine the semantic class of each object. connections. However, these works only focus on how
Recent works like YOLO [1, 2] and SSD [9] formulate to combine features at different scales regardless of
the task of detection as a regression problem and the relationships between objects and the associated
propose end-to-end trainable solutions. We use the layers composing the original image. The work
detection framework of the efficient YOLOv2 [2] as by Bellver et al. [6] formulates detection as a
the backbone of our framework. However, unlike reinforcement learning problem and represents an
YOLO or YOLOv2, as well as providing bounding image as a predefined hierarchical tree, leaving the
boxes and class labels, our framework also regresses agent to iteratively select subsequent parts to look
geometric parameters and handles the problem of at. The work most relevant to ours is CSGNet [20],
occlusion, in layered fashion. a recursive neural network model which generates a
To construct high-level objects using simple structured program defining the relationships between
primitives, Biederman [5] introduced the idea of visual a sparse set of primitives. However, the possible
composition. Recently, SCAN [10] tries to compose positions and sizes of the primitives are limited
visual primitives in a hierarchical way and learn an to the size of a finite action space. In contrast,
implicit hierarchy of concepts as well as their logical our work allows more detailed transformations of
relations using a β-VAE network. While they build primitives, and our layered representation is less
their hierarchy over concepts, our work is based on prone to redundancy.
visual containment relationships for different shapes.
2.4 Recurrent neural networks
Lake et al. [11] proposed a probabilistic program
induction scheme to parse hand-writing images into The recurrent neural network (RNN) (and its variants
several strokes and sub-strokes using a few images LSTM [21], GRU [22]) is a common model widely used
as training data, but their method is limited to the in natural language processing which has recently
specific domain of hand-written characters. been applied to computer vision tasks. One key
inspiration for our work is polygon-RNN [23], in which
2.2 Spline fitting and vectorization a sequence of vertices forming a polygon is predicted
Primitives and splines are widely used for representing in a recurrent manner. One of the key differences
geometry or images due to their succinctness and in our work is that we aim to abstract the simplest
precision. Thus, recovering them by fitting input types of representation on different layers, based on
data is a long-standing problem in graphics. The general splines instead of polylines, or interpolating
idea of iteratively minimizing a distance metric [12– cubic Bézier curves as in the polygon-RNN.
14], serving as a foundation of many studies, has been The discussion above only samples the studies
improved by either more effective distance metrics most relevant to our work. There are many other
[15] or more efficient optimization techniques [16]. relevant areas such as image parsing, dense captioning,
However, most previous works fail due to lack of structure-aware geometry processing, and more.
decent initialization, which is overcome by a learning- Despite richness of relevant works across a wide range
based algorithm in our case. It is worth noting that which manifest the importance of the topic, we believe
vectorizing rasterized images [17, 18] also aims to that the problem of understanding images as abstract
solve a related problem. However, since previous compositions is underexplored.
works do not decompose an image into assemblies
of clean primitives, there is a loss of high-level 3 Basic model
information about shape and layering.
In this section, we propose a framework based on
2.3 Layered object detection a standard modification of the YOLOv2 model [2],
Multiple works have of late attempted to introduce inspired by Ref. [24], to perform parameter regression.
composable layers into the process of object detection. The parameters regressed by the model, as opposed
388 J. Huang, J. Gao, V. Ganapathi-Subramanian, et al.

to those in Ref. [24], are fully interpretable and high- and predicted parameters respectively.
level. 3.2 Definition of primitive parameters
3.1 Adapting YOLO for parameter regression
Primitives with fixed number of parameters.
The primary idea of this model is to extend the Simple primitives like rectangles or circles have fixed
architecture of the state-of-the-art object detector numbers of parameters, and so the values of these
YOLOv2 to detect primitives in an image, and parameters can be used directly as ground truth
in addition, to estimate the parameters of each for training. For parameters lying within [0, 1], we
primitive. The deep neural network architecture can further increase the network training stability
is capable of extracting more detailed descriptors by applying a sigmoid function to the network
of detected objects, as well as the bounding box output to constrain the estimated parameters.
location. Providing additional structural information Readers are referred to Section S1 in the Electronic
about the object to the YOLOv2 architecture aids in Supplementary Material (ESM) for detailed definitions
augmenting the learned features. of primitive parameters.
The YOLOv2 network in the original paper Primitives with variable number of para-
consumes an entire image and segments it into a meters. Some of the primitives discussed in this
grid of size S × S. Each square in the grid can paper, including closed B-spline curves, have a
contain multiple primitives. The networks model variable number of control points. This permits
this multiplicity by containing up to B possible primitives to represent different kinds of shapes,
anchors (primitives in this case). Thus, traditional but it is not compatible with the previously defined
YOLOv2 networks learn S × S × B × (K + 5) different model. This incompatibility is solved by learning a
parameters; the K + 5 term arises since, in addition fixed-length embedding of the control point positions.
to the class labels for the K different primitive In addition, a recurrent neural network (RNN) is
classes, the network also predicts 1 object probability appended to the model, to serve as a decoder to
value and 4 bounding-box related values [2]. While output the control points in a sequential manner. At
regressing parameters for the bounding boxes, the time step i, the model predicts the position of the
regressor needs to predict M extra variables for each ith control point ci , and a stop probability pi ∈ [0, 1],
bounding box being predicted. The M variables that indicates the end of the curve. We apply cross-
are the total number of possible parameters from entropy life loss to the stop probability while training
all different primitive categories. This increases the the RNN.
number of parameters predicted by the network to The loss functions for the RNN-based model must
S × S × B × (5 + K + M ). be designed with care. Naively, one can use a
To achieve this end, a new loss term is added to the simple mean-squared error (MSE) loss for control
loss function previously proposed in Ref. [24]. The point position prediction and a cross entropy loss for
new term, Lp , feeds information about the primitive probability prediction. However, this only handles
parameters into the network. This term is defined as the situation where the sequence of control points is
S
S B K
fixed and well-defined. Note that every point in the
(k) (l) (m) (m)
Lp = 1i,j 1(i,j),k L(t(i,j),k , t̂(i,j),k ) control point sequence C = (c1 , . . . , cN ) of a closed
i=0 j=0 k=0 l=0 m∈X(l) spline curve can be viewed as the starting point of
(1) the sequence. Thus, in order to predict a control
(k)
where 1i,j is an indicator function that determines if point sequence invariant to the position of starting
grid square (i, j) is assigned a positive object label for point, a circular loss similar to that used in Ref. [23]
(l)
bounding box k. The indicator 1(i,j),k is a function is defined as follows:
that determines if bounding box k of grid square (i, j)
Lcirc = min (min(L(C, Gk ), L(C, Gk ))) (2)
belongs to the primitive defined by l. The purpose k∈[1,N ]
of introducing this term is to include a weighing for where L is the MSE loss, Gk is the ground truth
a primitive in the loss only when the primitive is control point sequence rotated by k places, i.e., if gi
plausible for the image. X(l) is the set of parameters denotes the ith control point in the ground truth, then
for primitive l. The terms t and t̂ denote the target Gk is the sequence (gk , · · · , gN , g1 , · · · , gk−1 ) and Gk
DeepPrimitive: Image decomposition by layered primitive detection 389

is the inverse sequence of Gk . In this way, the ground both faster and cognizant of previous learning. We
truth sequence that leads to minimum MSE loss is perform region of interest (RoI) pooling [25] on the
considered to be the target sequence, making the intermediate output of our network. This enables
loss function rotation-invariant. Also note that the us to extract regions in the image to focus on, to
introduction of Gk guarantees the loss to be invariant perform detection at the next level.
to clockwise and anti-clockwise sequencing.
4.2 Architecture

4 Layered detection model After an image is forwarded through the backbone

network, simple post-processing steps including
4.1 Layered detection thresholding and non-maximal suppression are
We use a layered model to capture the nested performed to obtain the final prediction results.
structure of primitives in an image. The idea is The backbone network is the previously discussed
inspired by two observations. Our first observation YOLO network with modified loss; the difference
is from how multiple layers in design tools, such as lies in that the backbone network is intended to
Adobe Photoshop and Illustrator, can help create a only predict primitives in the top layer, i.e., the
vector graphics image. With layers, artists can plan outermost primitives in the image. Following this,
the arrangement of items in the space in a top–down the coordinates of the bounding boxes of detected
manner. This fact that all vector icon images can be primitives are fed into an RoI pooling layer. The RoI
decomposed into multiple layers, as shown in Fig. 1, pooling layers consume the intermediate output of
serves as inspiration to extend the model proposed the network and pool it into a uniform sized feature
in Section 3 to include layered detection. Secondly, map for detection following the layering. Figure 2
for the detection of each layer, it allows one to focus illustrates this model.
on a specific part of the image, instead of working on Specifically, the architecture of the backbone
the entire image. For example in Fig. 1, the white network can be treated as multiple consecutive
rectangle in the lower-right of the image is completely modules, which contain several convolution layers
inside the black disk: one can focus in the interior with ReLU activation; each module is combined with
of the disk where the only accessible primitive is the pooling layers. We denote the modules by f1 , · · · , fM
rectangle. (from shallow layers to deep layers). The deepest layer
However, training separate networks for different fM has output J1 that is processed by the detection
levels of detection is a redundant and time-consuming block d1 . Subsequent detection blocks di process the
process, since intuitively, the parameters regressed output of convolutional layer fM −i+1 . We do not
by these networks are likely to be related. Therefore, use the whole feature map Ji as the input to di , but
we propose a layered detection model to perform this instead, we crop the feature map using the prediction
regression task, thereby making the training process results from di−1 and resize it to a uniform size. In

Fig. 2 The detection process in our layered model. Cuboids denote input images or feature maps. Dark blue arrows, dark green arrows, and
dark purple arrows represent conv layers, RoI pooling layers, and detection blocks, respectively; notation is consistent with that in the text.
The ﬁnal output of our network is a layered primitive tree containing both shape information and layer information.
390 J. Huang, J. Gao, V. Ganapathi-Subramanian, et al.

this way, the layering is represented explicitly by We observed that the predicted bounding box position
cropping within the interior of an image. This model is usually more accurate than the regressed parameters.
can be expressed as Hence, a local parameter with respect to the bounding
B(1) = d1 (J1 ) (3) box is defined for each primitive so as to be able to
perform better reconstruction. Readers are referred to
B(i) = di (R[Ji ; B(i − 1)]), i2 (4)
Section S1 in the ESM for detailed descriptions of the
where R[J; B(i)] represents feature map J cropped parameters used.
using bounding box information from B(i) which is
5.2 Network architecture
fed to an RoI pooling layer to obtain a uniform size
output for future processing. Our code is adapted from an open source PyTorch
Lower level feature maps are employed for deeper implementation x . The backbone network uses the
layer detection since deeper layer primitives are Darknet-19 architecture configured as in Redmon and
usually smaller in size and thus clearer feature maps Farhadi [2]. We set the depth of our layered detection
are required to perform accurate detection. For model to 3, using three detection blocks. Detailed
consistency within different regions in image, we configuration of detection block di (i = 1, 2, 3) is
perform training using local coordinates within the provided in Section S2 of the ESM.
parent bounding box as the ground truth for B(i). 5.3 Training
For example, consider an image with a rectangle
The entire hierarchical model can be trained fully end-
inside a circle. Then, the ground truth coordinates
to-end. Additionally, we adopt a method similar to
for the rectangle should lie within the local coordinate
scheduled sampling [26] to enhance training stability
system with respect to the circle. Therefore, predicted
and testing performance. The predicted information
coordinates are transformed before calculating the
B(i − 1) from level i − 1, which is fed into level i, is
loss functions. These local coordinates are used for
substituted by the ground truth value for level i − 1
ground truth since RoI pooling is known to capture
with probability p. The value of p is set to 0.9 in the
partial information in the image, as testified by faster-
first 10 epochs and is subsequently decreased by 0.05
RCNN [8]. Meanwhile, since there are multiple
every 2 epochs.
layers of convolutional operations, the feature map
An RNN decoder model is pre-trained separately
can encode some information outside the bounding
to regress a fixed length embedding for control point
box, thus providing the model with the capability to
positions. While training this RNN model, the grid
correct mistakes made in outer layers, by considering
number S is set to 1 in the YOLOv2 detection
both local and global information while making
framework and the features of closed spline curve
detections in inner layers.
images are extracted with our backbone Darknet-
It is worth noting that the information passed from
19 network. The pre-trained RNN decoder learns
higher to lower layers is not simply restricted to the
to decode the fixed length embedding and output
explicit bounding box position. The feature map in
positions of control points sequentially. When the
shallower convolutional layers is used to predict both
layered model is being trained, the value of the
higher and lower level primitives (e.g., in Fig. 2, J2
embedding is used as direct supervision. In the
affects both B(1) and B(2)). Although we only pass
first 5 epochs, the embedding is supervised and in
the bounding box information explicitly, knowledge
subsequent epochs, the network is trained with the
from higher layers can be passed implicitly via these
positions of control points instead. Note that the
related feature maps.
RNNs share the same weights across different levels
of the hierarchy.
5 Implementation
5.4 Data synthesis
In this section, we present our implementation details. Following previous works [10, 27], we use synthetic
5.1 Primitive and parameter selection datasets due to the lack of annotated datasets. The
hierarchical model was trained with 150,000 synthetic
Four types of primitives are used in our experiments:
rectangles, triangles, ellipses, and closed spline curves. x https://github.com/longcw/yolo2-pytorch
DeepPrimitive: Image decomposition by layered primitive detection 391

pictures of size 416 × 416. When we generated the 6.2 Comparisons to other methods
training data, we kept the containment relationships Although our model detects primitives in a layered
across layers; there may be multiple primitives in each manner, simple object detection measurements
layer. The number of primitives in a single image including precision and recall rate (or mAP for
is restricted to 8, the maximum number of layers methods with confidence score output) can be applied
to 3, and the number of control points of closed to test model accuracy. Meanwhile, we define our
spline curves varies from 5 to 7. In order to test the reconstruction loss as the pixel-wise RMSE between
robustness of our method, noise was added to the the input picture and the re-rendered picture using
shapes of the primitives, as well as hatching patterns the predicted results from the network. There are
for primitives and some skewing of the image itself. multiple approaches to shape detection; we set up 5
Selected dataset images are shown in Fig. 3. independent baselines for comparison. The first two
baselines are traditional methods while the last three
6 Experiments and results are learning-based approaches:
6.1 Ablation study for circular loss • Contour method. In this method, edge detection
is first applied to the input image; each
During the pretraining process for the RNN decoder
independent contour is separated. A post-
to predict control point positions, we compare the
processing approximation step is then employed to
training and validation losses using two different loss
replace almost collinear segments with a single line
functions, i.e., the previously defined Lcirc and a
segment with a parameter q controlling the strength
simple MSE loss. As shown in Table 1, training
of approximation. The type of shape is determined
with circular loss leads to better convergence loss
by counting the number of line segments (i.e., its
and thus better prediction results. Figure 4 shows
number of edges). This method is implemented
two examples comparing the prediction results given
using findContours and approxPolyDP functions
the same curve image as input. We found that using
of OpenCV [28].
circular loss eliminates the ambiguity of starting point
• Hough transform [29]. This is widely used to find
and clock direction in the training data, and leads to
imperfect shape instances in images by a voting
more accurate fitting results.
procedure in parameter space. For rectangles and
Table 1 Error and accuracy measures during training and testing triangles, whose edges are straight line segments,
with two different loss functions. Loss denotes the MSE distance we first use Hough line transform to detect all
between the ground truth and predicted positions of control points
(distances are normalized to lie in the unit interval). # Point Acc. possible lines and then recover the parameters of
denotes the frequency of predicting the number of control points the primitives by solving a set of linear equations.
correctly
For ellipses, we use the method described in
Training Validation Ref. [30].
Loss # Point Acc. Loss # Point Acc. • CSGNet [20]. In 2D, this takes a single image
LMSE 0.12203 74.60 0.12210 74.93 as input and generates a program defining the
Lcirc 0.04365 76.32 0.04369 75.83
shapes presented. This model allows for more

Fig. 3 Examples drawn from our synthetic training dataset. For the Pure dataset, we synthesized simple binary images for training. The
Pure+Noise dataset modiﬁed the Pure dataset by adding noise and random aﬃne transformations to each image. The Tex. (short for
“Textured”) dataset allows testing of the robustness of shape detection methods by adding hatching patterns to the shapes. The Textured+Noise
dataset imitates real world hand drawn shape pictures. The Natural dataset imitates colored versions of real world images.
392 J. Huang, J. Gao, V. Ganapathi-Subramanian, et al.

The contour method with small q value traces

the pixels on the contour precisely but ignores the
high-level shape information of the shape boundary,
leading to a high reconstruction performance but low
precision and recall accuracy in shape classification
tasks. Using a greater q value simply approximates
continuous curves with polygons, leading to poor
reconstruction performance. It is also observed that
the contour method cannot separate overlapping
primitives since it only attempts to detect boundaries
in images. The Hough transform-based method for
Fig. 4 Two closed spline curve fitting cases using circular loss and line segment detection and circle detection requires
MSE loss. a careful choice of parameters; it generally leads
to higher recall values than the contour method.
complex Boolean operations between shapes but This method partially solves the overlap problem
the sizes and positions of the primitives are highly by extending detected line segments and finding
discretized. We use the post-processed (optimized) intersections, but cannot effectively distinguish
top-1 prediction as the output of this algorithm. extremely short line segments and segments of a
• Flat model. This method uses a learning approach circle.
trained using the YOLOv2 architecture. The The above problems can be overcome by learning-
ground truth of the detector is directly set to based models. Learning-based models generally have
all primitives in the canvas, regardless of their better performance across all different datasets and
hierarchical information. the gap in performance widens as we add more noise
• Recursive model. We train only one detector to to our dataset, which is partially due to the fact
detect the primitive in the first hierarchy (i.e., the that the learned features extracted from the image
outermost primitive at the current level). Once the using our data-driven method are more effective
detector successfully detects some primitives in the and representative in comparison to hand-crafted
current level, we crop the detected region, resize features of traditional methods. Despite the feature
the cropped region to the network input size, and improvement, the absence of effective shape and
feed the image into the same network again. relationship representations can be fatal to the final
Results from these different models are compared in detection results. Using CSGNet [20], the possible
Table 2 (precision–recall–reconstruction comparison) locations and sizes of primitives are restricted due
and Table 3 (primitive–reconstruction comparison). to the size limitation of the action space. In order
Some of the prediction results from different methods to compose the target shape, redundant shapes and
are shown in Fig. 5 using the same input in each case. expressions are generated.

Table 2 Precision, recall, and reconstruction loss measures using various methods as described in Fig. 3. Prec and Recall denote the precision
and recall values as percentages respectively while Recon measures the RMSE loss between the original picture and the reconstructed picture
using the layered prediction results
Pure Pure+Noise Textured Textured+Noise Natural
Method
Prec Recall Recon Prec Recall Prec Recall Prec Recall Prec Recall
Contour (q = 4 × 10−4 ) 78.8 42.9 1.44 10.1 37.7 10.8 54.6 10.0 47.5 5.9 62.2
Contour (q = 2 × 10−3 ) 94.0 72.8 1.70 32.5 60.1 16.8 88.0 15.6 73.2 6.4 70.3
Hough transform 32.6 78.6 1.61 5.1 73.7 — — — — — —
CSGNet (optimized) [20] 37.1 65.4 28.7 — — — — — — — —
Flat model 99.7 91.0 — 99.5 90.0 99.6 91.2 99.4 91.0 57.9 62.2
Recursive model 96.1 72.4 1.64 60.1 61.2 74.0 60.1 95.8 49.9 98.9 84.5
Our model 99.7 96.1 1.61 99.5 95.0 99.6 95.8 99.5 95.4 97.9 87.6
Our model (optimized∗ ) 99.7 96.1 1.39 99.6 95.0 — — — — — —
* It is impossible to measure reconstruction loss for images with texture or noise, making it unclear how to deﬁne the optimization target.
DeepPrimitive: Image decomposition by layered primitive detection 393

Table 3 Average precision (AP) measures of learning-based shape well-reconstructed. Unlike the baselines, our method
detection methods. Values are presented in percentage
can extract high-level shape information as well as
Mean Parallelogram Triangle Oval Spline
containment relationships. Our model outperforms
Flat 87.2 87.2 86.3 84.4 90.9
Recursive 54.3 43.8 53.8 76.0 43.6
the others both quantitatively and qualitatively,
Ours 90.5 88.2 90.7 90.9 92.0 except for the reconstruction loss. However, after
appending a simple local optimizer to our model,
denoted Our model (optimized) in Table 2, the
reconstruction loss is further decreased.
The trained model was applied directly to Google
Material icons [31] (lines 1–4 of Fig. 6, using Pure
model) and a small real world dataset containing
150 images selected from the PASCAL VOC2012
dataset [32] and the Internet (lines 5–8 of Fig. 6,
using Natural model). To the best of our knowledge,
no public dataset exists that provides ground truth
annotations at geometric primitive level. So we have
manually annotated the 150 images from this small
real world dataset. Testing using our trained model
reached an mAP (the metric used in all experiments)
of 54.5%. Readers are referred to Sections S3 and S4
in the ESM for further results.
While DeepPrimitive manages to decompose the
real world images into relevant primitives, it is to be
remembered that this is not the primary focus of our

Fig. 5 Detection results examples. Shapes detected at diﬀerent

levels are marked in diﬀerent colors: level 1, pink; level 2, orange; level
3, blue. For the ﬂat model, there is no predicted layer information, so
all shapes are marked in green.

Other learning-based baselines ﬁx this with simple

containment representations but problems still occur
due to lack of layering or incorrect layering. The
flat model detects almost all primitives regardless of
their layer. However, in cases where two primitives
of the same kind (e.g., concentric circles forming
an annulus) overlap, the post-processing step (non-
maxima suppression) eliminates one of them and
predicts the median result, which is undesirable. It is
also difficult to reconstruct the original image using
the detected primitives due to the loss of layering
Fig. 6 Selected test results for our layered detection model. In each
information. In the recursive model, the layering pair of columns, the left picture shows the original input image as
information is preserved, but if the detection in well as the detection result while the right picture reconstructs the
input image using the detection result (different instances of primitives
an outer layer is not accurate enough, the error within the same hierarchy vary slightly in color for clarity). More test
snowballs and the inner layer primitives cannot be results are available in Sections S3 and S4 in the ESM.
394 J. Huang, J. Gao, V. Ganapathi-Subramanian, et al.

work. Our current model is trained only on synthetic Another potential application is recognition-by-
images, but adapting synthetic images to real images components [5]. Usually, state-of-the-art classifiers
with domain adaptation techniques is one trend in the based on deep networks need very much data for
vision community. A few recent vision papers have training, and its lack hampers accuracy. Once
been trained and tested on purely synthetic datasets primitives in an image have been recognized, one
(e.g., Ref. [27]). can easily define classification rules using the layered
information obtained. Additional training data is not
7 Applications needed and only a single shape detection model has
to be trained. The idea is illustrated in Fig. 9. Given
Once an image has been decomposed into several an image, pre-processing steps such as denoising and
layers and high-level parameters defining the thresholding are performed to extract the borders
primitives in the image acquired, one can utilize this of shapes. The proposed model is then applied to
information for a variety of applications. In this detect the primitives and generate a shape parsing
paper, we demonstrate the use of these parameters tree (in XML format in the figure for demonstration
in two example applications. purposes), with which a handcrafted classifier could
The first application we present is image editing. easily predict the class of an object in the image by
It is usually very difficult for an artist to modify top–down traversal of the tree.
the shapes in a rasterized image directly. With a
low reconstruction loss, our model can decompose 8 Limitations
an image into several manipulable components with
high fidelity and flexibility. For example, in Fig. 7, it As an explorative study aiming to understand and
is easy for an icon designer to modify parameters of reconstruct images as primitives composed layer-wise,
the shapes, changing the angle between the hands of there are several limitations left to be resolved in
the clock, or tweaking the shape of the paint brush future work. For images with highly-overlapping
head. For real world images in Fig. 8, we can directly primitives within the same layer, our model cannot
manage the position of the parts in an image using distinguish between them: the output will either be
high-level editing tools (e.g., as in Ref. [33]). a single primitive or misclassified primitives. Our
model discovers only containment relationships: if
one higher-level primitive intersects multiple lower-
level primitives, duplicate detections of the higher-
level primitive are possible. The last two images of
line 4 in Fig. 6 demonstrate such failures. These
limitations restrict the layer decomposability of our
model. Meanwhile, only synthetic images are used
Fig. 7 Image editing on a rasterized image at a primitive level.
Primitive detection is performed on the image, followed by editing of
for training. Annotated real world data would make
the primitives. the model more generalizable.

Fig. 8 High-level image editing of real world images based on detected primitives. The ﬁrst two columns of each group show the original
image and its layered decomposition while the last two columns of each group show manipulated results.
DeepPrimitive: Image decomposition by layered primitive detection 395

References

[1] Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You

only look once: Unified, real-time object detection.
In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 779–788, 2016.
[2] Redmon, J.; Farhadi, A. YOLO9000: Better, faster,
stronger. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 6517–6525,
2017.
[3] Roberts, L. G. Machine perception of three-dimensional
solids. Ph.D. Thesis. Massachusetts Institute of
Technology, 1963.
Fig. 9 Recognition-by-components demonstration using our proposed
hierarchical primitive detection model. [4] Binford, T. O. Visual perception by computer. In:
Proceedings of the IEEE Conference on Systems and
Control, 1971.
9 Conclusions [5] Biederman, I. Recognition-by-components: A theory of
human image understanding. Psychological Review Vol.
This paper demonstrates a data-driven approach
94, No. 2, 115–147, 1987.
to layered detection of primitives in images, and
[6] Bellver, M.; Giro-i-Nieto, X.; Marques, F.; Torres, J.
subsequent 2D reconstruction. As noted, abstraction
Hierarchical object detection with deep reinforcement
of objects into primitives is a very natural way
learning. In: Proceedings of the Deep Reinforcement
for humans to understand objects. As artificial
Learning Workshop, NIPS, 2016.
intelligence moves towards performing tasks in
[7] Ballard, D. H. Generalizing the Hough transform to
human-like fashion, there is value in trying to perform
detect arbitrary shapes. Pattern Recognition Vol. 13,
these tasks in the way a human would.
No. 2, 111–122, 1981.
Such tasks often also fall in the intersection of
[8] Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN:
robotics and computer vision, e.g., in the cases of
Towards real-time object detection with region proposal
autonomous driving and robotics. In such tasks, networks. IEEE Transactions on Pattern Analysis and
building in environment-awareness into cars or robots Machine Intelligence Vol. 39, No. 6, 1137–1149, 2017.
based on their field of vision is key, and primitive- [9] Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed,
level reconstruction would be useful. Primitive- S.; Fu, C.-Y.; Berg, A. C. SSD: Single shot multibox
level understanding would also help in understanding detector. In: Computer Vision – ECCV 2016. Lecture
physical interactions with objects in manipulation Notes in Computer Science, Vol. 9905. Leibe, B.; Matas,
tasks. While there are many such avenues where J.; Sebe, N.; Welling, M. Eds. Springer Cham, 21–37,
this understanding could be applied, there is a lack 2016.
of open datasets for training on real world data. A [10] Higgins, I.; Sonnerat, N.; Matthey, L.; Pal, A.; Burgess,
good direction for future study would involve learning C.; Botvinick, M.; Hassabis, D.; Lerchner, A. SCAN:
tasks of an unsupervised or self-supervised kind. Learning abstract hierarchical compositional visual
concepts. arXiv preprint arXiv:1707.03389, 2017.
Acknowledgements [11] Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B.
Chengcheng Tang would like to acknowledge NSF Human-level concept learning through probabilistic
program induction. Science Vol. 350, No. 6266, 1332–
grant IIS-1528025, a Google Focused Research award,
1338, 2015.
a gift from the Adobe Corporation, and a gift from
[12] Rogers, D. F.; Fog, N. Constrained B-spline curve and
the NVIDIA Corporation.
surface fitting. Computer-Aided Design Vol. 21, No. 10,
Electronic Supplementary Material Supplementary 641–648, 1989.
material with detailed experimental configuration and results [13] Besl, P. J.; McKay, N. D. A method for registration
is available in the online version of this article at https: of 3-D shapes. IEEE Transactions on Pattern Analysis
//doi.org/10.1007/s41059-018-0128-6. and Machine Intelligence Vol. 14, No. 2, 239–256, 1992.
396 J. Huang, J. Gao, V. Ganapathi-Subramanian, et al.

[14] Chen, Y.; Medioni, G. Object modeling by registration Information Processing Systems 28. Cortes, C.;
of multiple range images. In: Proceedings of the IEEE Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; Garnett,
International Conference on Robotics and Automation, R. Eds. Curran Associates, Inc., 1171–1179, 2015.
2724–2729, 1991. [27] Wu, J.; Tenenbaum, J. B.; Kohli, P. Neural scene de-
[15] Wang, W.; Pottmann, H.; Liu, Y. Fitting B-spline rendering. In: Proceedings of the IEEE Conference on
curves to point clouds by curvature-based squared Computer Vision and Pattern Recognition, 2017.
distance minimization. ACM Transactions on Graphics [28] Itseez. Open source computer vision library. 2015.
Vol. 25, No. 2, 214–238, 2006. Available at https://github.com/itseez/opencv.
[16] Zheng, W.; Bo, P.; Liu, Y.; Wang, W. Fast B-spline [29] Duda, R. O.; Hart, P. E. Use of the Hough
curve fitting by L-BFGS. Computer Aided Geometric transformation to detect lines and curves in pictures.
Design Vol. 29, No. 7, 448–462, 2012. Communications of the ACM Vol. 15, No. 1, 11–15, 1972.
[17] Sun, J.; Liang, L.; Wen, F.; Shum, H.-Y. Image vectorization
[30] Xie, Y.; Ji, Q. A new efficient ellipse detection method.
using optimized gradient meshes. ACM Transactions
In: Proceedings of the IEEE International Conference
on Graphics Vol. 26, No. 3, Article No. 11, 2007.
on Pattern Recognition, Vol. 2, 957–960, 2002.
[18] Lecot, G.; Levy, B. Ardeco: Automatic region detection
[31] Google. Google material icon. 2017. Available at
and conversion. In: Proceedings of the 17th Eurographics
https://material.io/icons/.
Symposium on Rendering Techniques, 349–360, 2006.
[32] Everingham, M. The PASCAL Visual Object
[19] Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan,
Classes Challenge 2012 (VOC2012). Available at
B.; Belongie, S. Feature pyramid networks for object
http://www.pascal-network.org/challenges/VOC/
detection. In: Proceedings of the IEEE Conference on
voc2012/workshop/index.html.
Computer Vision and Pattern Recognition, 2117–2125,
2017. [33] Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D.
B. PatchMatch: A randomized correspondence algorithm
[20] Sharma, G.; Goyal, R.; Liu, D.; Kalogerakis, E.; Maji,
for structural image editing. ACM Transactions on
S. CSGNet: Neural shape parser for constructive solid
geometry. In: Proceedings of the IEEE Conference on Graphics Vol. 28, No. 3, Article No. 24, 2009.
Computer Vision and Pattern Recognition, 5515–5523,
2018. Jiahui Huang received his B.S. degree
[21] Gers, F. A.; Schraudolph, N. N.; Schmidhuber, J. in computer science and technology
Learning precise timing with LSTM recurrent networks. from Tsinghua University in 2018. He
Journal of Machine Learning Research Vol. 3, No. 1, is currently a Ph.D. candidate in
115–143, 2002. computer science in Tsinghua University.
His research interests include computer
[22] Cho, K.; Merriënboer, B. V.; Gulcehre, C.; Bahdanau,
vision and computer graphics.
D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning
phrase representations using RNN encoder–decoder
for statistical machine translation. arXiv preprint Jun Gao received his B.S. degree in
arXiv:1406.1078, 2014. computer science from Peking University
[23] Castrejón, L.; Kundu, K.; Urtasun, R.; Fidler, S. in 2018. He is a graduate student
Annotating object instances with a polygon-RNN. In: in the Machine Learning Group at
Proceedings of the IEEE Conference on Computer the University of Toronto and also
Vision and Pattern Recognition, 5230–5238, 2017. affiliates to the Vector Institute. His
research interests are in deep learning
[24] Jetley, S.; Sapienza, M.; Golodetz, S.; Torr, P. H. S.
and computer vision.
Straight to shapes: Real-time detection of encoded
shapes. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 4207–4216, Vignesh G. Subramanian is a
2017. Ph.D. candidate in the Department
of Electrical Engineering, Stanford
[25] Girshick, R. Fast R-CNN. In: Proceedings of the IEEE
University. He previously obtained
International Conference on Computer Vision, 1440–
his dual degrees (B.Tech. in EE and
1448, 2015.
M.Tech. in communication engineering)
[26] Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. from IIT Madras, India. His research
Scheduled sampling for sequence prediction with interests include shape correspondences,
recurrent neural networks. In: Advances in Neural 3D geometry, graphics, and vision.
DeepPrimitive: Image decomposition by layered primitive detection 397

Hao Su received his Ph.D. degree Leonidas J. Guibas received his

from Stanford University, under the Ph.D. degree from Stanford University
supervision from Leonidas Guibas. He in 1976, under the supervision of
joined UC San Diego in 2017 and Donald Knuth. His main subsequent
is currently an assistant professor of employers were Xerox PARC, MIT, and
computer science and engineering. His DEC/SRC. Since 1984, he has been
research interests include computer at Stanford University, where he is a
vision, computer graphics, machine professor of computer science. His
learning, robotics, and optimization. More details of his research interests include computational geometry, geometric
research can be found at http://ai.ucsd.edu/haosu. modeling, computer graphics, computer vision, sensor
networks, robotics, and discrete algorithms. He is a
senior member of the IEEE and the IEEE Computer
Yin Liu received his B.S. degree from Society. More details about his research can be found at
Department of Automation of Tsinghua http://geometry.stanford.edu/member/guibas/.
University in 2018. He is currently a
Ph.D. candidate in computer science Open Access The articles published in this journal
at the University of Wisconsin-Madison. are distributed under the terms of the Creative
His research interest is in machine Commons Attribution 4.0 International License (http://
learning. creativecommons.org/licenses/by/4.0/), which permits
unrestricted use, distribution, and reproduction in any
medium, provided you give appropriate credit to the original
Chengcheng Tang received his author(s) and the source, provide a link to the Creative
Ph.D. and M.S. degrees from King Commons license, and indicate if changes were made.
Abdullah University of Science and
Technology (KAUST) in 2015 and 2011, Other papers from this open access journal are available
respectively, and his bachelor degree free of charge from http://www.springer.com/journal/41095.
from Jilin University in 2009. He is To submit a manuscript, please go to https://www.
currently a postdoctoral scholar in editorialmanager.com/cvmj.
the Computer Science Department at
Stanford University. His research interests include computer
graphics, geometric computing, computational design, and
machine learning.

Data-Driven 3D Primitives For Single Image Understanding
No ratings yet
Data-Driven 3D Primitives For Single Image Understanding
8 pages
Deep Learning for Object Tracking
No ratings yet
Deep Learning for Object Tracking
3 pages
Sapkota Et Al., 2025
No ratings yet
Sapkota Et Al., 2025
28 pages
Real-Time CNN Visual Recognition
No ratings yet
Real-Time CNN Visual Recognition
13 pages
YOLO Advances To Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
No ratings yet
YOLO Advances To Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
83 pages
Yolov10 To Its Genesis A Decadal and Comprehensive
No ratings yet
Yolov10 To Its Genesis A Decadal and Comprehensive
49 pages
Object Detection and Localization Using Stereo Cameras
No ratings yet
Object Detection and Localization Using Stereo Cameras
6 pages
Lich Su Dang
No ratings yet
Lich Su Dang
6 pages
Recent Advances in Deep Learning For Object Detection
No ratings yet
Recent Advances in Deep Learning For Object Detection
26 pages
General Framework For Object Detection
No ratings yet
General Framework For Object Detection
9 pages
Khan Unsupervised Primitive Discovery For Improved 3D Generative Modeling CVPR 2019 Paper
No ratings yet
Khan Unsupervised Primitive Discovery For Improved 3D Generative Modeling CVPR 2019 Paper
10 pages
2019CVPRW Asynchronous Convolutional Networks For Object Detection in Neuromorphic Cameras
No ratings yet
2019CVPRW Asynchronous Convolutional Networks For Object Detection in Neuromorphic Cameras
17 pages
CNN Image Recognition Advances
No ratings yet
CNN Image Recognition Advances
14 pages
A Novel Model To Detect and Categorize Objects From Images by Using A Hybrid Machine Learning Model
No ratings yet
A Novel Model To Detect and Categorize Objects From Images by Using A Hybrid Machine Learning Model
13 pages
Advanced Object Detection for Experts
No ratings yet
Advanced Object Detection for Experts
21 pages
1525 Context Augmentation and Featu
No ratings yet
1525 Context Augmentation and Featu
11 pages
YOLOv2: Real-Time Object Detection
No ratings yet
YOLOv2: Real-Time Object Detection
5 pages
Computer Vision Application
No ratings yet
Computer Vision Application
2 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Object Detection and Game-Based Learning
No ratings yet
Object Detection and Game-Based Learning
23 pages
Electronics-Object Detection YOLO
No ratings yet
Electronics-Object Detection YOLO
12 pages
Lecture 19
No ratings yet
Lecture 19
19 pages
Learning To Reason Over Visual Objects
No ratings yet
Learning To Reason Over Visual Objects
22 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Object Tracking in Crowd Environment Using Deep Learning
No ratings yet
Object Tracking in Crowd Environment Using Deep Learning
8 pages
Centralized Feature Pyramid For Object Detection
No ratings yet
Centralized Feature Pyramid For Object Detection
14 pages
EScholarship UC Item 3rd9150m
No ratings yet
EScholarship UC Item 3rd9150m
128 pages
Irjet V10i1063
No ratings yet
Irjet V10i1063
6 pages
Design of A Real-Time Object Detection Prototype S
No ratings yet
Design of A Real-Time Object Detection Prototype S
6 pages
YOLOv2 MATLAB Underwater Detection
No ratings yet
YOLOv2 MATLAB Underwater Detection
8 pages
An Investigation of Deep Neural Network Based Techniques For Object Detection An
No ratings yet
An Investigation of Deep Neural Network Based Techniques For Object Detection An
6 pages
PHD Visual Object Category Recognition
No ratings yet
PHD Visual Object Category Recognition
193 pages
Object Detection Using ELAN
No ratings yet
Object Detection Using ELAN
6 pages
Thesis Z Ai
No ratings yet
Thesis Z Ai
46 pages
230623-Paper-Learning and Representing Object Shape Through An Array of Orientation Columns
No ratings yet
230623-Paper-Learning and Representing Object Shape Through An Array of Orientation Columns
14 pages
Image Sorting for Tech Enthusiasts
No ratings yet
Image Sorting for Tech Enthusiasts
6 pages
Fast Feature Pyramids For Object Detection
No ratings yet
Fast Feature Pyramids For Object Detection
14 pages
E3sconf Iconnect2023 04032
No ratings yet
E3sconf Iconnect2023 04032
11 pages
Centroidal Profiles
No ratings yet
Centroidal Profiles
6 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
Deep Learning in Image Detection
No ratings yet
Deep Learning in Image Detection
16 pages
MJEER-Volume 30-Issue 1 - Page 52-57
No ratings yet
MJEER-Volume 30-Issue 1 - Page 52-57
6 pages
Object Detection Techniques A Review
No ratings yet
Object Detection Techniques A Review
9 pages
Articulated Pose Estimation With Flexible Mixtures-Of-Parts
No ratings yet
Articulated Pose Estimation With Flexible Mixtures-Of-Parts
8 pages
ST ND RD: Ntroduction
No ratings yet
ST ND RD: Ntroduction
4 pages
Spatial Context-Aware Object-Attentional Network For Multi-Label Image Classification
No ratings yet
Spatial Context-Aware Object-Attentional Network For Multi-Label Image Classification
13 pages
Fin Irjmets1654850281
No ratings yet
Fin Irjmets1654850281
10 pages
Repconv: A Novel Architecture For Image Scene Classification On Intel Scenes Dataset
No ratings yet
Repconv: A Novel Architecture For Image Scene Classification On Intel Scenes Dataset
12 pages
CARTO Category and Joint Agnostic Reconstruction
No ratings yet
CARTO Category and Joint Agnostic Reconstruction
10 pages
Real Time Object Recognition and Classification
No ratings yet
Real Time Object Recognition and Classification
6 pages
1 s2.0 S095219762400616X Main
No ratings yet
1 s2.0 S095219762400616X Main
19 pages
Harsha Thesis
No ratings yet
Harsha Thesis
62 pages
Google Research: 3D Vision & Robotics
No ratings yet
Google Research: 3D Vision & Robotics
35 pages
Chapter Book
No ratings yet
Chapter Book
30 pages
Geoderma Regional: Chirkes Johanna D., Heredia Olga S., Alicia Fernández Cirelli
No ratings yet
Geoderma Regional: Chirkes Johanna D., Heredia Olga S., Alicia Fernández Cirelli
8 pages
Evaluation of The Index of Atmospheric Purity in An American T 2020 Ecologic
No ratings yet
Evaluation of The Index of Atmospheric Purity in An American T 2020 Ecologic
11 pages
Automatic Classification of Severe and Mild Wear in Worn Surface Ima 2019 We
No ratings yet
Automatic Classification of Severe and Mild Wear in Worn Surface Ima 2019 We
10 pages
Introduction To THE TMS320C6x Vliw DSP: Prof. Brian L. Evans
No ratings yet
Introduction To THE TMS320C6x Vliw DSP: Prof. Brian L. Evans
31 pages
Sprui 03 B
No ratings yet
Sprui 03 B
329 pages
Neurocomputing: Huikai Shao, Dexing Zhong
No ratings yet
Neurocomputing: Huikai Shao, Dexing Zhong
12 pages
Samsung Dvd-sh895 Dvd-sh893 Dvd-sh897 Dvd-sh893m Dvd-sh895m Dvd-sh89m Cezanne 6th Gen Full
No ratings yet
Samsung Dvd-sh895 Dvd-sh893 Dvd-sh897 Dvd-sh893m Dvd-sh895m Dvd-sh89m Cezanne 6th Gen Full
142 pages
Ai Scripting Docsforadobe Dev en Latest
No ratings yet
Ai Scripting Docsforadobe Dev en Latest
884 pages
Tamil Nadu - Esanjeevani National Summit
No ratings yet
Tamil Nadu - Esanjeevani National Summit
23 pages
MerlinCorp SAM Discovery Tool Comparison 2017
No ratings yet
MerlinCorp SAM Discovery Tool Comparison 2017
178 pages
Quezon City University: Bachelor of Science in Information Technology Department Enrollment System
No ratings yet
Quezon City University: Bachelor of Science in Information Technology Department Enrollment System
32 pages
Introduction To MongoDB
No ratings yet
Introduction To MongoDB
8 pages
Spring Annotations Guide
No ratings yet
Spring Annotations Guide
9 pages
Bolean Implementation Using Mux
No ratings yet
Bolean Implementation Using Mux
29 pages
SGFL Job Opportunities 2020
No ratings yet
SGFL Job Opportunities 2020
7 pages
Network Upgrade Status Report
No ratings yet
Network Upgrade Status Report
3 pages
Sample Field Report Ict
No ratings yet
Sample Field Report Ict
2 pages
Top10VPN GWI Global VPN Usage Report 2020
No ratings yet
Top10VPN GWI Global VPN Usage Report 2020
20 pages
Man Dds Soft
No ratings yet
Man Dds Soft
454 pages
Advanced Security Camera Guide
No ratings yet
Advanced Security Camera Guide
4 pages
MP3 vs MP4: Key Differences Explained
No ratings yet
MP3 vs MP4: Key Differences Explained
3 pages
Class XII CS Exam Marking Scheme
No ratings yet
Class XII CS Exam Marking Scheme
138 pages
Wireless Device Functions Explained
No ratings yet
Wireless Device Functions Explained
3 pages
PacketFabric Colt Infosheet
No ratings yet
PacketFabric Colt Infosheet
1 page
IC Datasheet
100% (14)
IC Datasheet
62 pages
Troubleshoot Section
100% (1)
Troubleshoot Section
487 pages
Grade 11 IT Exam Instructions
No ratings yet
Grade 11 IT Exam Instructions
9 pages
DIDM e Projects
100% (7)
DIDM e Projects
9 pages
Supermarket Billing System Project
No ratings yet
Supermarket Billing System Project
7 pages
Logistics Planning PDF
No ratings yet
Logistics Planning PDF
87 pages
Azure Cost Optimization & Monitoring Guide
100% (1)
Azure Cost Optimization & Monitoring Guide
45 pages
Control Center
No ratings yet
Control Center
3 pages
Help For The W3C Markup Validation Service
No ratings yet
Help For The W3C Markup Validation Service
10 pages
Introduction To Java 2 Platform
No ratings yet
Introduction To Java 2 Platform
43 pages
Jurnal 2211600123 Steven Adriandi Vodegel
No ratings yet
Jurnal 2211600123 Steven Adriandi Vodegel
5 pages
Madinah Visitor Housing GIS
100% (1)
Madinah Visitor Housing GIS
11 pages

DeepPrimitive: Layered Image Decomposition

Uploaded by

DeepPrimitive: Layered Image Decomposition

Uploaded by

Computational Visual Media

https://doi.org/10.1007/s41095-018-0128-6 Vol. 4, No. 4, December 2018, 385–397

DeepPrimitive: Image decomposition by layered primitive detection

Abstract The perception of the visual world through 1 Introduction

4 Layered detection model After an image is forwarded through the backbone

The contour method with small q value traces

Fig. 5 Detection results examples. Shapes detected at diﬀerent

Other learning-based baselines ﬁx this with simple

[1] Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You

Hao Su received his Ph.D. degree Leonidas J. Guibas received his

You might also like