Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views7 pages

Polylanenet: Lane Estimation Via Deep Polynomial Regression

The document presents PolyLaneNet, a novel method for lane estimation using deep polynomial regression, which processes images from a forward-looking camera to output polynomial representations of lane markings. This method achieves competitive results with existing state-of-the-art techniques while maintaining high efficiency at 115 FPS and does not require post-processing. The authors provide source code and pretrained models for reproducibility, addressing common issues in lane detection research such as reliance on private datasets and lack of transparency.

Uploaded by

nguyen hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views7 pages

Polylanenet: Lane Estimation Via Deep Polynomial Regression

The document presents PolyLaneNet, a novel method for lane estimation using deep polynomial regression, which processes images from a forward-looking camera to output polynomial representations of lane markings. This method achieves competitive results with existing state-of-the-art techniques while maintaining high efficiency at 115 FPS and does not require post-processing. The authors provide source code and pretrained models for reproducibility, addressing common issues in lane detection research such as reliance on private datasets and lack of transparency.

Uploaded by

nguyen hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

PolyLaneNet: Lane Estimation

via Deep Polynomial Regression


Lucas Tabelini∗ , Rodrigo Berriel∗ , Thiago M. Paixo∗† , Claudine Badue∗ ,
Alberto F. De Souza∗ and Thiago Oliveira-Santos∗
∗ Universidade Federal do Esprito Santo (UFES), Brazil
† Instituto Federal do Esprito Santo (IFES), Brazil
Email: [email protected]

Abstract—One of the main factors that contributed to the large dawn, tunnels, etc.) conditions, which might just change while
advances in autonomous driving is the advent of deep learning. driving.
For safer self-driving vehicles, one of the problems that has yet The traditional approach for the lane estimation (or detec-
to be solved completely is lane detection. Since methods for this
task have to work in real-time (+30 FPS), they not only have tion) task consists in the extraction of hand-crafted features
to be effective (i.e., have high accuracy) but they also have to [5], [6] followed by a curve-fitting process. Although this
approach tends to work well under normal and limited cir-
arXiv:2004.10924v2 [cs.CV] 14 Jul 2020

be efficient (i.e., fast). In this work, we present a novel method


for lane detection that uses as input an image from a forward- cumstances, it is not usually as robust as needed in adverse
looking camera mounted in the vehicle and outputs polynomials conditions (as the aforementioned ones). Therefore, following
representing each lane marking in the image, via deep polynomial
regression. The proposed method is shown to be competitive with the trend in many computer vision problems, deep learning has
existing state-of-the-art methods in the TuSimple dataset while recently started to be used to learn robust features and improve
maintaining its efficiency (115 FPS). Additionally, extensive qual- the lane marking estimation process [7]–[9]. Once the lane
itative results on two additional public datasets are presented, markings are estimated, further processing can be performed
alongside with limitations in the evaluation metrics used by recent to determine the actual lanes. Still, there are limitations to
works for lane detection. Finally, we provide source code and
trained models that allow others to replicate all the results shown be tackled. First, many of these deep learning-based models
in this paper, which is surprisingly rare in state-of-the-art lane tackle the lane marking estimation as a two-step process: fea-
detection methods. The full source code and pretrained models ture extraction and curve fitting. Most works extract features
are available at https://github.com/lucastabelini/PolyLaneNet. via segmentation-based models, which usually are inefficient
and have trouble running in real-time, as required for au-
I. I NTRODUCTION tonomous driving. Additionally, the segmentation step alone
is not enough for providing a lane marking estimate since
Autonomous driving [1] is a challenging field of research the segmentation maps have to be post-processed in order to
that has received a lot of attention in recent years. The percep- output traffic lines. Further, these two-step processes might
tual problems related to this field have been immensely im- ignore global information [8], which are specially important
pacted by the advances in deep learning [2]–[4]. In particular, when there are missing visual cues (as in strong shadows
autonomous vehicles should be capable of estimating traffic and occlusions). Second, some of these works are performed
lanes because, besides working as a spatial limit, each lane by private companies that often (i) do not provide means to
provides specific visual cues ruling the travel. In this context, replicate their results and (ii) develop their methods on private
the two most important traffic lines (i.e., lane markings) are datasets, which hinders research progress. Lastly, there is room
those defining the lane of the vehicle, i.e., the ego-lane. These for improvement in the evaluation protocol. The methods
lines set the limits for the driver’s actions and their types define are usually tested on datasets from the USA only (roads in
whether or not maneuvers (e.g., lane changes) are allowed. developing countries are usually not as well maintained) and
Also, it might be useful to detect the adjacent lanes so that the the evaluation metrics are too permissive (they allow error in
systems’ decisions might be based on a better understanding such a way that it hinders proper comparisons), as discussed
of the traffic scene. in Section IV.
Lane estimation (or detection) may seem trivial at first, In this context, methods focusing on removing the need for
but it can be very challenging. Although fairly standardized, a two-step process further reducing the processing cost could
lane markings vary in shape and colour. Estimating a lane benefit advanced driver assistance systems (ADAS) that often
when dashed or partially occluded lane markers are presented rely on low-energy and embedding hardware. In addition, a
requires a semantic understanding of the scene. Moreover, the method that has been tested on roads other than the USA’s
environment itself is inherently diverse: there may be a lot of is also of benefit to the broader community. Moreover, less
traffic, people passing by, or it could be just a free highway. permissive metrics would allow to better differentiate methods
In addition, these environments are subject to several weather and provide a clearer overview of the methods and their
(e.g., rain, snow, sunny, etc.) and illumination (e.g., day, night, usefulness.
PolyLaneNet
Fully-Connected

Backbone L1 a0,1 a 0,2 a0,5


Input Image a1,1 a 1,2 a1,5
L2 a2,1 a 2,2 a2,5
...
a3,1 a 3,2 a3,5 c3 = 0.80
c1 = 0.99

...
Lj s1 s2 s5 c2 = 0.95 h
c1 c2 c5 s3
L5
Shared h s2
s1 = 0

Fig. 1. Overview of the proposal method. From left to right: the model receives as input an image from a forward-looking camera and outputs information
about each lane marking in the image.

This work proposes PolyLaneNet, a convolutional neural More recently, a lane detection challenge was held on
network (CNN) for end-to-end lane markings estimation. Poly- CVPR’17 in which the TuSimple [16] dataset was released.
LaneNet takes as input images from a forward-looking camera The winner of the challenge was SCNN [7], a method
mounted in the vehicle and outputs polynomials that represent proposed for traffic scene understanding that exploits the
each lane marking in the image, along with the domains for propagation of spatial information via specially designed CNN
these polynomials and confidence scores for each lane. This structure. Their model outputs a probability map for the lanes
approach is shown to be competitive with existing state-of- that are post-processed in order to provide the lane estimates.
the-art methods while being faster and not requiring post- To evaluate their system, they used an evaluation metric that
processing to have the lane estimates. In addition, we provide is based on the IoU between the prediction and the ground-
a deeper analysis using metrics suggested by the literature. truth. After that, in [8], the authors proposed Line-CNN,
Finally, we publicly released the source-code (for both training a model in which the key component is the line proposal
and inference) and the trained models, allowing the replication unit (LPU) adapted from the region proposal network (RPN)
of all the results presented in this paper. of Faster R-CNN. They also submitted their results to the
TuSimple benchmark (after the challenge was finished) with
II. R ELATED W ORKS marginally better results compared to SCNN. Their main
Lane Detection. Before the rise of deep learning, methods experiments, though, were with a much larger dataset that was
on lane detection were mostly model- or learning-based, i.e., not publicly released. In addition to this private dataset, the
they used to exploit hand-crafted and specialized features. source code is proprietary and the authors will not release
Shape and color were the most commonly used features [10], it. Another approach is FastDraw [17] in which the common
[11], and lanes were normally represented both by straight post-processing of segmentation-based methods is substituted
and curved lines [12], [13]. These methods, however, were by “drawing” the lanes according to the likelihood of polylines
not robust to sudden illumination changes, weather conditions, that are maximized at training time. In addition to evaluat-
differences in appearance between cameras, and many other ing on the TuSimple and CULane [7] datasets, the authors
things that can be found in driving scenes. The interested provide qualitative results on yet another private US-based
reader is referred to [5] for a more complete survey on earlier dataset. Moreover, they did not release their implementation,
lane detection methods. which hinders further comparisons. Some of the segmentation-
With the success of deep learning, researchers have also based methods focus on improving the inference speed, as
investigated its use to tackle lane detection. Huval et al. [14] in [9] (ENet-SAD) which focuses on learning lightweight
were one of the first to use deep learning in lane detection. CNNs by exploiting self-attention distillation. The authors
Their model is based on the OverFeat and produces as output evaluated their method on three well-known datasets. Although
a sort of segmentation map that is later post-processed using the source code was publicly released, some of the results
DBSCAN clustering. They collected a private dataset on San are not reproducible1 . Closer to our work, [18] proposes a
Francisco (USA) that was used to train and evaluate their differentiable least-squares fitting module to fit a curve on
system. Because of the success of their application, companies points predicted by a deep neural network. In our work, we
were also interested in investigating this problem. Later, Ford bypass the need for this module by directly predicting the
released DeepLanes [15], which unlike most of the literature, polynomial coefficients, which simplifies the method while
detects lanes based on laterally-mounted cameras. Despite the
1 According to the author of [9], the difference in performance comes from
good results, the way they modeled the problem made it less
engineering tricks neither described in the paper nor included in the avail-
widely applicable, and they also used a private US-based able code: https://web.archive.org/web/20200503114942/https://github.com/
dataset. cardwing/Codes-for-Lane-Detection/issues/208
marking j, the vertical offset s∗j was set as min {yi,j
∗ N

also making it faster. }i=1 ;
In summary, one of the main problems with existing state- the confidence score is defined as
of-the methods is reproducibility, since most either do not (
publish the datasets used or the source code. In this work, 1, if j ≤ M
c∗j = (3)
we present results that are competitive with state-of-the-art 0, otherwise.
methods on public datasets and fully reproducible, since we
provide the source code and use only publicly available The model is trained using the multi-task loss function
datasets (including one from outside the US). defined as (for a single image)
L({Pj }, h, {sj }, {cj }) =Wp Lp ({Pj }, {L∗j })
III. P OLY L ANE N ET
1 X
+Ws Lreg (sj , s∗j )
M j
Model Definition. PolyLaneNet expects as input images taken (4)
1 X
from a forward-looking vehicle camera, and outputs, for each +Wc Lcls (cj , c∗j )
image, Mmax lane marking candidates (represented as poly- M j
nomials), as well as the vertical position h of the horizon line, +Wh Lreg (h, h∗ ),
which helps to define the upper limit of the lane markings. The
architecture of PolyLaneNet consists of a backbone network where Wp , Ws , Wc , and Wh are constant weights used for bal-
(for feature extraction) appended with a fully connected layer ancing. The regressions Lreg and Lcls are the Mean Squared
with Mmax + 1 outputs, being the outputs 1, . . . , Mmax for Error (MSE) and Binary Cross Entropy (BCE) functions,
lane marking prediction and the output Mmax + 1 for h. respectively. The Lp loss function measures how well adjusted
PolyLaneNet adopts a polynomial representation for the lane is the polynomial pj (Equation 1) to the annotated points.
markings instead of a set of points. Therefore, for each output Consider the annotated x-coordinates x∗j = [x∗1,j , . . . , x∗N,j ]T
j, j = 1, . . . , Mmax , the model estimates the coefficients and xj = [x1,j , . . . , xN,j ] where
Pj = {ak,j }Kk=0 representing the polynomial
(
∗ ∗
pj (yi,j ), if |pj (yi,j ) − x∗i,j | > τloss
xi,j = (5)
K
X 0, otherwise.
pj (y) = ak,j y k , (1)
where τloss is an empirically defined threshold that tries
k=0
to reduce the focus of the loss on points that are already
where K is a parameter that defines the order of the polyno- well aligned. Such effect appears because the lane markings
mial. As illustrated in Figure 1, the polynomials have restricted comprise several points with different sampling differences
domain: the height of the image. Besides the coefficients, the (i.e., points closer to the camera are denser than points further
model estimates, for each lane marking j, the vertical offset sj , away). Finally, Lp is defined as
and the prediction confidence score cj ∈ [0, 1]. In summary,
the PolyLaneNet model can be expressed as Lp ({Pj }, {L∗j }) = Lreg (xj , x∗j ). (6)

f (I; θ) = ({Pj , sj , cj )}M IV. E XPERIMENTAL M ETHODOLOGY


j=1 , h),
max
(2)
PolyLaneNet was evaluated on publicly available which are
where I is the input image and θ is the model parameters. At introduced in this section. Following, the section describes
inference time, as illustrated in Figure 1, only the lane marking the implementation details, the metrics, and the experiments
candidates whose confidence score is greater or equal than a performed.
threshold are considered as detected.
A. Datasets
Model Training. For an input image, let M be the number Three datasets were used to evaluate PolyLaneNet: TuSim-
of annotated lane markings given an input image. In general, ple [16], LLAMAS [19] and ELAS [6]. For quantitative
traffic scenes contain few lanes, being M ≤ 4 for most images results, the widely-used TuSimple [16] was employed. The
in the available datasets. For training (and metric evaluation), dataset has a total of 6,408 annotated images with a resolution
each annotated lane marking j, j = 1, . . . , M , is associated to of 1280×720 pixels, and it is originally split in 3,268 for
the neuron unit j of the output. Therefore, predictions related training, 358 for validation, and 2,782 for testing. For qual-
to the outputs M + 1, . . . , Mmax should be disregarded in the itative results, two other datasets were used: LLAMAS [19]
loss function. An annotated
 ∗ lane marking j is represented by and ELAS [6]. The first is a large dataset, split into 58,269
∗ ∗ N ∗ ∗
a set of points Lj = (xi,j , yi,j ) i=1 , where yi+1,j > yi,j , images for training, 20,844 for validation, and 20,929 for
for every i = 1, . . . , N − 1. As a rule of thumb, the higher is test, with a resolution of 1280×717 pixels. Both TuSimple
N , the more it allows to capture richer structures. We assume and LLAMAS are datasets from the USA. Since neither the
that the lane markings {L∗j }M j=1 are ordered according to the benchmark nor the test set annotations for LLAMAS are
x-coordinate of the point closest to the bottom of the image, available yet, only qualitative results are presented. ELAS is
i.e., x∗0,j < x∗0,j+1 for every j = 1, . . . , M − 1. For each lane a dataset with 16,993 images from various cities in Brazil,
with a resolution of 640×480 pixels. Since the dataset was (LPD) was proposed to better capture the accuracy of the
originally proposed for a non-learning based method, it does model on both the near and far depths of view of the ego-
not provide training/testing splits. Thus, we created those splits vehicle. It is the error between the prediction and the ground-
by separating 11,036 images for training and 5,957 for testing. truth for the ego-lane. To define what are the ego-lanes (given
The main difference between ELAS and the other two datasets that the dataset labels and our model are agnostic to this
is that in ELAS only the ego-lane is annotated. Nonetheless, definition), we use a simple definition: the lane markings that
the dataset also provides other types of useful information are closer to the middle of the bottom part of the image are
for the lane detection task, such as lane types (e.g., solid or the ones that compose the ego-lane, i.e., one lane marking to
dashed, white or yellow), but they are not used in this paper. the left and another one to the right.
In addition to metrics w.r.t. the quality of the predictions,
B. Implementation details
we also report two speed-related metrics: frames-per-second
The hyperparameters for every experiment in this work were (FPS) and MACs3 . The frames-per-second provide a concrete
the same, except for the ablation study, where in each training assessment of how fast an implementation can run on a
one parameter was modified. For the backbone network, the modern computer with a recent GPU, whereas MACs provide
EfficientNet-b0 [20] was used. For the TuSimple training, a more reliable way to compare different methods that might
data augmentation was applied with a probability of 10 11 . The be running on different frameworks and setups. As discussed
transformations used were: rotation with an angle in degrees in [22], analyzing the trade-off between computation efficiency
θ ∼ U(−10, 10), horizontal flip with a probability of 0.5, and and accuracy is also important. In this paper, we provide such
a random crop with size 1152×648 pixels. After the data an analysis by reporting the MACs of PolyLaneNet variants
augmentation, the following transformations were applied: with different computational requirements in an ablation study.
a resize to 640×360 pixels and then a normalization with
ImageNet’s [21] mean and standard deviation. The Adam D. Quantitative Evaluation
optimizer was used, along with the Cosine Annealing learning State-of-the-art Comparison. The main quantitative ex-
rate scheduler with an initial learning rate of 3e-4 and a period periment for the proposed method is the comparison against
of 770 epochs. The training session ran for 2695 epochs, taking state-of-the-art methods using the same evaluation conditions.
approximately 35 hours on a Titan V, with a batch size of 16 For that, the proposed method was used to train a model
images, from a model pretrained on ImageNet [21]. A third- using a union of TuSimple’s training and validation sets and
order polynomial degree was chosen to be the default. For then evaluated in its testing set. Four state-of-the-art methods
the loss function, the parameters Ws = Wc = Wh = 1 and were compared: SCNN [7], Line-CNN [8], ENet-SAD [9],
Wp = 300 were used. The threshold τloss (Equation 5) was set and FastDraw [17]. Besides prediction quality metrics, model
to 20 pixels. In the testing phase, lane markings predicted with speed w.r.t. FPS is also presented. For our model, we also
a confidence score cj < 0.5 were ignored. For more details, reported the MACs.
the source code and trained models are publicly available2 . Polynomial Degree. In most lane marking detection
datasets, it is clear that lane markings with a more accentuated
C. Evaluation Metrics
curvature are rarer, while straight ones represent the majority
The metrics used to measure the proposed method’s per- of the cases. With this in mind, one might enquire: what would
formance come from TuSimple’s benchmark [16]. The three be the impact of modeling lane markings with polynomials of
metrics are: accuracy (Acc), false positive (F P ) and false lower orders? To help answer this question, our method was
negative (F N ) rates. For a predicted lane marking to be evaluated using first- and second-order polynomials, instead
considered a true positive (i.e., a correct one), its accuracy, of the default of third-order polynomials. Furthermore, we
defined as also show the permissiveness of the standard TuSimple’s
1 X metric used by the literature by computing upper bounds for
Acc(Pj , L∗j ) = ∗ ∗
1[|pj (yi,j ) − x∗i,j | < τacc ]
|Lj | ∗ ∗ ∗
polynomials of different orders.
(xi,j ,yi,j )∈Lj
Ablation Study. To investigate the impact of some of the
(7)
decisions made for the proposed method, an ablation study was
has to be equal to or greater than . The values used for τacc
carried out, using only TuSimple’s training set for training and
and  were 20 pixels and 0.85, respectively, the same ones
the validation set for testing. For the model backbone f (·, θ),
used in TuSimple’s benchmark. All the three reported metrics
ResNet [23] was evaluated, on two of its variants: ResNet-34
(Acc, F P and F N ) are reported as the average across all
and 50. Another variant of EfficientNet was also evaluated, the
images of the average of each image.
EfficientNet-b1. Moreover, when training CNNs, in addition
Although TuSimple’s metric has been widely used in the
to the impact of the backbone, there is a trade-off when using
literature, it is too permissive w.r.t. local errors. To avoid
different image input sizes. For example, if a smaller input size
relying on only such metric, we also used a metric proposed
is used, the network forward will be faster, but information
in [22], which discusses several evaluation metrics of interest
to the lane estimation process. The Lane Position Deviation 3 For reference, roughly speaking, one MAC (Multiply-Accumulate) is
equivalent to 2 FLOPS. MACs were computed using the following library:
2 https://github.com/lucastabelini/PolyLaneNet https://github.com/mit-han-lab/torchprofile.
may be lost. To measure this trade-off [22] in the proposed
method, two additional models were trained, one using an
input size of 480 × 270 pixels, and the other using an input
size of 320 × 180 pixels. Additionally, three other practical
decisions were evaluated: (i) the impact of not sharing h (i.e.,
having the end of each lane predicted individually), (ii) the
use of a pre-trained model, by training from scratch instead of
a model pretrained on ImageNet; and (iii) the impact of using
data augmentation, by removing the online data augmentation,
which reduces the variability seen by the model at training
time.

E. Qualitative Evaluation
For qualitative results, an extensive evaluation was carried
Fig. 2. Qualitative results of PolyLaneNet on TuSimple.
out. Using the model trained on TuSimple as a pretraining,
three models were trained: two on ELAS, one with and
one without lane marking type classification, and another TABLE I
S TATE - OF - THE - ART RESULTS ON T U S IMPLE .
on LLAMAS. On ELAS, the model was trained for 385 PP = R EQUIRES P OST-P ROCESSING .
additional epochs (half of a period of the chosen learning rate
scheduler, where the learning rate will be at a minimum). On Method Acc FP FN FPS MACs PP
LLAMAS, the model was trained for 75 additional epochs, an Line-CNN [8] 96.87% 0.0442 0.0197 30
approximation to the number of iterations used on ELAS, as ENet-SAD [9] 96.64% 0.0602 0.0205 75 X
the training set of LLAMAS is around five times larger than SCNN [7] 96.53% 0.0617 0.0180 7 X
FastDraw [17] 95.20% 0.0760 0.0450 90 X
the one of ELAS. The experiment with lane marking type
classification is a straightforward extension of PolyLaneNet, PolyLaneNet 93.36% 0.0942 0.0933 115 1.748G
in which a category is predicted for each lane showcasing how
trivial it is to extend our model.
model’s performance. The LPD metric [22], however, is able to
V. R ESULTS better capture the difference between the models trained using
First, we present the results of the comparison with the state- 1st order polynomials and the others. This can be further seen
of-the-art. Then, the results of the ablation study are detailed in Table III, which shows the maximum performance (i.e.,
and discussed. Finally, qualitative results are shown. the upper bound) of methods that represent lane markings as
State-of-the-art Comparison. The state-of-the-art results polynomials, measured by fitting polynomials on the test data
on the TuSimple dataset are presented in Table I. As evi- itself. As it can be seen, TuSimple’s metric does not punish
denced, PolyLaneNet results are competitive. Since none of predictions that are accurate only in parts of the lane marking
the compared methods provide source codes that replicate their closer to the car, wherein the image it will look almost straight
respective published results, it is very difficult to investigate (i.e., can be represented well by 1st order polynomials), since
situations where the other methods succeed that ours fail. On the thresholds may hide those mistakes. Meanwhile, the LDP
Figure 2, some qualitative results of PolyLaneNet on TuSimple metric clearly distinguishes the upper bounds, showing a clear
are shown. It is noticeable that PolyLaneNet’s predictions on difference even between the 4th and 5th degrees, in which
parts of the lane marking closer to the camera (where more TuSimple’s metrics are almost identical.
details can be seen) are very accurate. Nonetheless, on parts of
the lane marking closer to the horizon, the predictions are less TABLE II
A BLATION S TUDY RESULTS ON T U S IMPLE VALIDATION S ET
accurate. We conjecture that this might be a result of a local W. R . T. P OLYNOMIAL D EGREE
minimum, caused by the dataset’s imbalance. Since most lane
markings in the dataset can be represented fairly well with 1st Modification Acc FP FN LPD
order polynomials (i.e., lines), the neural network has a bias 1st 88.63% 0.2231 0.1865 2.532
towards predicting lines, thus the poor performance on lane Polynomial Degree 2nd 88.89% 0.2223 0.1890 2.316
markings with accentuated curvature. 3rd 88.62% 0.2237 0.1844 2.314
Polynomial Degree. In terms of the polynomial degree used
to represent the lane marking, the small difference in accuracy Ablation Study. The ablation study results are shown
when using lower-order polynomials shows how unbalanced in Table IV. EfficientNet-b1 achieved the highest accuracy,
the dataset is. Using 1st order polynomials (i.e., lines) de- followed by EfficientNet-b0 and ResNet-34. Those results sug-
creased the accuracy by only 0.35 p.p. Although the dataset’s gest that larger networks, such as ResNet-50, may overfit the
imbalance certainly has an impact on this, another important data. Although EfficientNet-b1 achieved the highest accuracy,
factor is the metric used by the benchmark to evaluate a we chose not to use it in other experiments, as the accuracy
TABLE III in different datasets. However, in ELAS, there are many lane
T U S IMPLE P ERFORMANCE U PPERBOUND OF P OLYNOMIALS changes. In those situations, the model’s accuracy decreased
significantly. Since the images of those situations have a very
Polynomial Degree Acc FP FN LPD
different structure (e.g., the car is not heading towards the
1st 96.22% 0.0393 0.0367 1.512
2nd 97.25% 0.0191 0.0175 1.116
road direction), the low amount of samples in this situation
3rd 97.84% 0.0016 0.0014 0.732 may have not been enough for the model to learn it.
4th 98.00% 0.0000 0.0000 0.497
5th 98.03% 0.0000 0.0000 0.382 VI. C ONCLUSION
In this work, a novel method for lane detection based
gains are not significant nor consistent in our experiments. on deep polynomial regression was proposed. The proposed
In addition, it is more computationally expensive (i.e., lower method is simple and efficient while maintaining competitive
FPS, higher MACs, and longer training times). In regards to accuracy when compared to state-of-the-art methods. Although
the input size, reducing it also means reducing the accuracy, works with state-of-the-art methods with slightly higher ac-
as expected. In some cases, this accuracy loss may not be curacy exist, most do not provide source code to replicate
significant, but the speed gains may be. For example, using an their results, therefore deeper investigations on differences
input size of 480×270 decreased the accuracy by only 0.55 between methods are difficult. Our method, besides being
p.p., but the model MACs decreased by 1.82 times. computationally efficient, will be publicly available so that
future works on lane markings detection have a baseline to
TABLE IV start work and for comparison. Furthermore, we’ve shown
A BLATION S TUDY RESULTS ON T U S IMPLE VALIDATION S ET problems on the metrics used to evaluate lane markings detec-
W. R . T BACKBONE AND I NPUT S IZE tion methods. For future works, metrics that can be used across
different approaches to lane detection (e.g., segmentation) and
Modification Acc FP FN MACs (G)
that better highlights flaws in lane detection methods can be
ResNet-34 88.07% 0.2267 0.1953 17.154 explored.
ResNet-50 83.37% 0.3472 0.3122 19.135
Backbone
EfficientNet-b1 89.20% 0.2170 0.1785 2.583
EfficientNet-b0 88.62% 0.2237 0.1844 1.748 ACKNOWLEDGMENT
320×180 85,45% 0.2924 0.2446 0.396 This study was financed in part by the Coordenaco de
Input Size 480×270 88.39% 0.2398 0.1960 0.961
640×360 88.62% 0.2237 0.1844 1.748 Aperfeioamento de Pessoal de Nvel Superior - Brasil (CAPES)
- Finance Code 001, Conselho Nacional de Desenvolvimento
As to the other ablation studies we carried out, one can see Cientfico e Tecnolgico (CNPq, Brazil), PIIC UFES and Fun-
that sharing the top-y (h) is slightly better than not sharing. dao de Amparo Pesquisa do Esprito Santo - Brasil (FAPES)
Moreover, training from a model pretrained on ImageNet grant 84412844. We thank NVIDIA for providing GPUs used
seems to have a significant impact on the final result, as in this research.
shown by the difference of 4.26 p.p. The same happens with
R EFERENCES
data augmentation, as the model trained with more data has a
significantly higher accuracy. [1] C. Badue, R. Guidolini, R. V. Carneiro, P. Azevedo, V. B. Cardoso,
A. Forechi, L. Jesus, R. Berriel, T. Paixão, F. Mutz et al., “Self-driving
Cars: A Survey,” arXiv preprint arXiv:1901.04407, 2019.
TABLE V [2] L. C. Possatti, R. Guidolini, V. B. Cardoso, R. F. Berriel, T. M.
A BLATION S TUDY RESULTS ON T U S IMPLE VALIDATION S ET Paixão, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “Traffic light
recognition using deep learning and prior maps for autonomous cars,”
Modification Acc FP FN in 2019 International Joint Conference on Neural Networks (IJCNN).
IEEE, 2019, pp. 1–8.
No 88,43% 0.2126 0.1783
Top-Y Sharing [3] P. Yang, G. Zhang, L. Wang, L. Xu, Q. Deng, and M.-H. Yang, “A part-
Yes 88.62% 0.2237 0.1844
aware multi-scale fully convolutional network for pedestrian detection,”
None 84,37% 0.3317 0.2826 IEEE Transactions on Intelligent Transportation Systems, 2020.
Pretraining [4] D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser,
ImageNet 88.62% 0.2237 0.1844
F. Timm, W. Wiesbeck, and K. Dietmayer, “Deep multi-modal object
None 78.63% 0.4188 0.4048 detection and semantic segmentation for autonomous driving: Datasets,
Data Augmentation 10× 88.62% 0.2237 0.1844 methods, and challenges,” IEEE Transactions on Intelligent Transporta-
tion Systems, 2020.
[5] J. C. McCall and M. M. Trivedi, “Video Based Lane Estimation and
Qualitative Evaluation. A sample of the qualitative results Tracking for. Driver Assistance: Survey, System, and Evaluation,” IEEE
on ELAS and LLAMAS are shown in Figure 3. For more Transactions on Intelligent Transportation Systems, vol. 7, no. 1, pp.
extensive results, videos are available4 . The results show that 20–37, 2006.
[6] R. F. Berriel, E. de Aguiar, A. F. De Souza, and T. Oliveira-Santos,
transfer learning works well on PolyLaneNet, since a smaller “Ego-Lane Analysis System (ELAS): Dataset and Algorithms,” Image
number of epochs was enough to obtain reasonable results and Vision Computing, vol. 68, pp. 64–75, 2017.
[7] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial As Deep:
4 Qualitative results (videos) on ELAS/LLAMAS: https://www.youtube. Spatial CNN for Traffic Scene Understanding,” in Thirty-Second AAAI
com/playlist?list=PLm8amuguiXiJ2zKvcapUJI ybyOFi9yz9 Conference on Artificial Intelligence, 2018.
Fig. 3. Qualitative results of PolyLaneNet on ELAS (top row) and LLAMAS (bottom row).

[8] X. Li, J. Li, X. Hu, and J. Yang, “Line-CNN: End-to-End Traffic Line Networks,” in Proceedings of the IEEE Conference on Computer Vision
Detection With Line Proposal Unit,” IEEE Transactions on Intelligent and Pattern Recognition (CVPR) Workshops, 2016, pp. 38–45.
Transportation Systems, vol. 21, no. 1, pp. 248–258, 2019. [16] TuSimple. TuSimple Benchmark. [Online]. Available: https://github.
[9] Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning Lightweight Lane com/TuSimple/tusimple-benchmark
Detection CNNs by Self Attention Distillation,” in Proceedings of the [17] J. Philion, “FastDraw: Addressing the Long Tail of Lane Detection by
IEEE International Conference on Computer Vision (ICCV), 2019, pp. Adapting a Sequential Prediction Network,” in Proceedings of the IEEE
1013–1021. Conference on Computer Vision and Pattern Recognition (CVPR), 2019,
[10] K. Kluge and S. Lakshmanan, “A deformable-template approach to lane pp. 11 582–11 591.
detection,” in Proceedings of the Intelligent Vehicles Symposium. IEEE, [18] W. Van Gansbeke, B. De Brabandere, D. Neven, M. Proesmans, and
1995, pp. 54–59. L. Van Gool, “End-to-end lane detection through differentiable least-
[11] K.-Y. Chiu and S.-F. Lin, “Lane Detection using Color-based Segmen- squares fitting,” arXiv preprint arXiv:1902.00293, 2019.
tation,” in Proceedings Intelligent Vehicles Symposium. IEEE, 2005, [19] K. Behrendt and R. Soussan, “Unsupervised labeled lane marker dataset
pp. 706–711. generation using maps,” in Proceedings of the IEEE International
[12] C. R. Jung and C. R. Kelber, “Lane Following and Lane Departure Using Conference on Computer Vision (ICCV), 2019.
a Linear-Parabolic Model,” Image and Vision Computing, vol. 23, no. 13, [20] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for con-
pp. 1192–1202, 2005. volutional neural networks,” in Proceedings of the 36th International
[13] R. F. Berriel, E. de Aguiar, V. V. de Souza Filho, and T. Oliveira-Santos, Conference on Machine Learning (ICML), 2019, pp. 6105–6114.
“A Particle Filter-based Lane Marker Tracking Approach Using a Cubic [21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
Spline Model,” in 28th SIBGRAPI Conference on Graphics, Patterns and A large-scale hierarchical image database,” in Proceedings of the IEEE
Images. IEEE, 2015, pp. 149–156. Conference on Computer Vision and Pattern Recognition (CVPR), 2009,
[14] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, pp. 248–255.
M. Andriluka, P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, F. Mujica, [22] R. K. Satzoda and M. M. Trivedi, “On Performance Evaluation Metrics
A. Coates, and A. Y. Ng, “An empirical evaluation of deep learning on for Lane Estimation,” in International Conference on Pattern Recogni-
highway driving,” arXiv preprint arXiv:1504.01716, 2015. tion (ICPR). IEEE, 2014, pp. 2625–2630.
[15] A. Gurghian, T. Koduri, S. V. Bailur, K. J. Carey, and V. N. Murali, [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
“DeepLanes: End-To-End Lane Position Estimation using Deep Neural recognition,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2016, pp. 770–778.

You might also like