Polylanenet: Lane Estimation Via Deep Polynomial Regression
Polylanenet: Lane Estimation Via Deep Polynomial Regression
Abstract—One of the main factors that contributed to the large dawn, tunnels, etc.) conditions, which might just change while
advances in autonomous driving is the advent of deep learning. driving.
For safer self-driving vehicles, one of the problems that has yet The traditional approach for the lane estimation (or detec-
to be solved completely is lane detection. Since methods for this
task have to work in real-time (+30 FPS), they not only have tion) task consists in the extraction of hand-crafted features
to be effective (i.e., have high accuracy) but they also have to [5], [6] followed by a curve-fitting process. Although this
approach tends to work well under normal and limited cir-
arXiv:2004.10924v2 [cs.CV] 14 Jul 2020
...
Lj s1 s2 s5 c2 = 0.95 h
c1 c2 c5 s3
L5
Shared h s2
s1 = 0
Fig. 1. Overview of the proposal method. From left to right: the model receives as input an image from a forward-looking camera and outputs information
about each lane marking in the image.
This work proposes PolyLaneNet, a convolutional neural More recently, a lane detection challenge was held on
network (CNN) for end-to-end lane markings estimation. Poly- CVPR’17 in which the TuSimple [16] dataset was released.
LaneNet takes as input images from a forward-looking camera The winner of the challenge was SCNN [7], a method
mounted in the vehicle and outputs polynomials that represent proposed for traffic scene understanding that exploits the
each lane marking in the image, along with the domains for propagation of spatial information via specially designed CNN
these polynomials and confidence scores for each lane. This structure. Their model outputs a probability map for the lanes
approach is shown to be competitive with existing state-of- that are post-processed in order to provide the lane estimates.
the-art methods while being faster and not requiring post- To evaluate their system, they used an evaluation metric that
processing to have the lane estimates. In addition, we provide is based on the IoU between the prediction and the ground-
a deeper analysis using metrics suggested by the literature. truth. After that, in [8], the authors proposed Line-CNN,
Finally, we publicly released the source-code (for both training a model in which the key component is the line proposal
and inference) and the trained models, allowing the replication unit (LPU) adapted from the region proposal network (RPN)
of all the results presented in this paper. of Faster R-CNN. They also submitted their results to the
TuSimple benchmark (after the challenge was finished) with
II. R ELATED W ORKS marginally better results compared to SCNN. Their main
Lane Detection. Before the rise of deep learning, methods experiments, though, were with a much larger dataset that was
on lane detection were mostly model- or learning-based, i.e., not publicly released. In addition to this private dataset, the
they used to exploit hand-crafted and specialized features. source code is proprietary and the authors will not release
Shape and color were the most commonly used features [10], it. Another approach is FastDraw [17] in which the common
[11], and lanes were normally represented both by straight post-processing of segmentation-based methods is substituted
and curved lines [12], [13]. These methods, however, were by “drawing” the lanes according to the likelihood of polylines
not robust to sudden illumination changes, weather conditions, that are maximized at training time. In addition to evaluat-
differences in appearance between cameras, and many other ing on the TuSimple and CULane [7] datasets, the authors
things that can be found in driving scenes. The interested provide qualitative results on yet another private US-based
reader is referred to [5] for a more complete survey on earlier dataset. Moreover, they did not release their implementation,
lane detection methods. which hinders further comparisons. Some of the segmentation-
With the success of deep learning, researchers have also based methods focus on improving the inference speed, as
investigated its use to tackle lane detection. Huval et al. [14] in [9] (ENet-SAD) which focuses on learning lightweight
were one of the first to use deep learning in lane detection. CNNs by exploiting self-attention distillation. The authors
Their model is based on the OverFeat and produces as output evaluated their method on three well-known datasets. Although
a sort of segmentation map that is later post-processed using the source code was publicly released, some of the results
DBSCAN clustering. They collected a private dataset on San are not reproducible1 . Closer to our work, [18] proposes a
Francisco (USA) that was used to train and evaluate their differentiable least-squares fitting module to fit a curve on
system. Because of the success of their application, companies points predicted by a deep neural network. In our work, we
were also interested in investigating this problem. Later, Ford bypass the need for this module by directly predicting the
released DeepLanes [15], which unlike most of the literature, polynomial coefficients, which simplifies the method while
detects lanes based on laterally-mounted cameras. Despite the
1 According to the author of [9], the difference in performance comes from
good results, the way they modeled the problem made it less
engineering tricks neither described in the paper nor included in the avail-
widely applicable, and they also used a private US-based able code: https://web.archive.org/web/20200503114942/https://github.com/
dataset. cardwing/Codes-for-Lane-Detection/issues/208
marking j, the vertical offset s∗j was set as min {yi,j
∗ N
also making it faster. }i=1 ;
In summary, one of the main problems with existing state- the confidence score is defined as
of-the methods is reproducibility, since most either do not (
publish the datasets used or the source code. In this work, 1, if j ≤ M
c∗j = (3)
we present results that are competitive with state-of-the-art 0, otherwise.
methods on public datasets and fully reproducible, since we
provide the source code and use only publicly available The model is trained using the multi-task loss function
datasets (including one from outside the US). defined as (for a single image)
L({Pj }, h, {sj }, {cj }) =Wp Lp ({Pj }, {L∗j })
III. P OLY L ANE N ET
1 X
+Ws Lreg (sj , s∗j )
M j
Model Definition. PolyLaneNet expects as input images taken (4)
1 X
from a forward-looking vehicle camera, and outputs, for each +Wc Lcls (cj , c∗j )
image, Mmax lane marking candidates (represented as poly- M j
nomials), as well as the vertical position h of the horizon line, +Wh Lreg (h, h∗ ),
which helps to define the upper limit of the lane markings. The
architecture of PolyLaneNet consists of a backbone network where Wp , Ws , Wc , and Wh are constant weights used for bal-
(for feature extraction) appended with a fully connected layer ancing. The regressions Lreg and Lcls are the Mean Squared
with Mmax + 1 outputs, being the outputs 1, . . . , Mmax for Error (MSE) and Binary Cross Entropy (BCE) functions,
lane marking prediction and the output Mmax + 1 for h. respectively. The Lp loss function measures how well adjusted
PolyLaneNet adopts a polynomial representation for the lane is the polynomial pj (Equation 1) to the annotated points.
markings instead of a set of points. Therefore, for each output Consider the annotated x-coordinates x∗j = [x∗1,j , . . . , x∗N,j ]T
j, j = 1, . . . , Mmax , the model estimates the coefficients and xj = [x1,j , . . . , xN,j ] where
Pj = {ak,j }Kk=0 representing the polynomial
(
∗ ∗
pj (yi,j ), if |pj (yi,j ) − x∗i,j | > τloss
xi,j = (5)
K
X 0, otherwise.
pj (y) = ak,j y k , (1)
where τloss is an empirically defined threshold that tries
k=0
to reduce the focus of the loss on points that are already
where K is a parameter that defines the order of the polyno- well aligned. Such effect appears because the lane markings
mial. As illustrated in Figure 1, the polynomials have restricted comprise several points with different sampling differences
domain: the height of the image. Besides the coefficients, the (i.e., points closer to the camera are denser than points further
model estimates, for each lane marking j, the vertical offset sj , away). Finally, Lp is defined as
and the prediction confidence score cj ∈ [0, 1]. In summary,
the PolyLaneNet model can be expressed as Lp ({Pj }, {L∗j }) = Lreg (xj , x∗j ). (6)
E. Qualitative Evaluation
For qualitative results, an extensive evaluation was carried
Fig. 2. Qualitative results of PolyLaneNet on TuSimple.
out. Using the model trained on TuSimple as a pretraining,
three models were trained: two on ELAS, one with and
one without lane marking type classification, and another TABLE I
S TATE - OF - THE - ART RESULTS ON T U S IMPLE .
on LLAMAS. On ELAS, the model was trained for 385 PP = R EQUIRES P OST-P ROCESSING .
additional epochs (half of a period of the chosen learning rate
scheduler, where the learning rate will be at a minimum). On Method Acc FP FN FPS MACs PP
LLAMAS, the model was trained for 75 additional epochs, an Line-CNN [8] 96.87% 0.0442 0.0197 30
approximation to the number of iterations used on ELAS, as ENet-SAD [9] 96.64% 0.0602 0.0205 75 X
the training set of LLAMAS is around five times larger than SCNN [7] 96.53% 0.0617 0.0180 7 X
FastDraw [17] 95.20% 0.0760 0.0450 90 X
the one of ELAS. The experiment with lane marking type
classification is a straightforward extension of PolyLaneNet, PolyLaneNet 93.36% 0.0942 0.0933 115 1.748G
in which a category is predicted for each lane showcasing how
trivial it is to extend our model.
model’s performance. The LPD metric [22], however, is able to
V. R ESULTS better capture the difference between the models trained using
First, we present the results of the comparison with the state- 1st order polynomials and the others. This can be further seen
of-the-art. Then, the results of the ablation study are detailed in Table III, which shows the maximum performance (i.e.,
and discussed. Finally, qualitative results are shown. the upper bound) of methods that represent lane markings as
State-of-the-art Comparison. The state-of-the-art results polynomials, measured by fitting polynomials on the test data
on the TuSimple dataset are presented in Table I. As evi- itself. As it can be seen, TuSimple’s metric does not punish
denced, PolyLaneNet results are competitive. Since none of predictions that are accurate only in parts of the lane marking
the compared methods provide source codes that replicate their closer to the car, wherein the image it will look almost straight
respective published results, it is very difficult to investigate (i.e., can be represented well by 1st order polynomials), since
situations where the other methods succeed that ours fail. On the thresholds may hide those mistakes. Meanwhile, the LDP
Figure 2, some qualitative results of PolyLaneNet on TuSimple metric clearly distinguishes the upper bounds, showing a clear
are shown. It is noticeable that PolyLaneNet’s predictions on difference even between the 4th and 5th degrees, in which
parts of the lane marking closer to the camera (where more TuSimple’s metrics are almost identical.
details can be seen) are very accurate. Nonetheless, on parts of
the lane marking closer to the horizon, the predictions are less TABLE II
A BLATION S TUDY RESULTS ON T U S IMPLE VALIDATION S ET
accurate. We conjecture that this might be a result of a local W. R . T. P OLYNOMIAL D EGREE
minimum, caused by the dataset’s imbalance. Since most lane
markings in the dataset can be represented fairly well with 1st Modification Acc FP FN LPD
order polynomials (i.e., lines), the neural network has a bias 1st 88.63% 0.2231 0.1865 2.532
towards predicting lines, thus the poor performance on lane Polynomial Degree 2nd 88.89% 0.2223 0.1890 2.316
markings with accentuated curvature. 3rd 88.62% 0.2237 0.1844 2.314
Polynomial Degree. In terms of the polynomial degree used
to represent the lane marking, the small difference in accuracy Ablation Study. The ablation study results are shown
when using lower-order polynomials shows how unbalanced in Table IV. EfficientNet-b1 achieved the highest accuracy,
the dataset is. Using 1st order polynomials (i.e., lines) de- followed by EfficientNet-b0 and ResNet-34. Those results sug-
creased the accuracy by only 0.35 p.p. Although the dataset’s gest that larger networks, such as ResNet-50, may overfit the
imbalance certainly has an impact on this, another important data. Although EfficientNet-b1 achieved the highest accuracy,
factor is the metric used by the benchmark to evaluate a we chose not to use it in other experiments, as the accuracy
TABLE III in different datasets. However, in ELAS, there are many lane
T U S IMPLE P ERFORMANCE U PPERBOUND OF P OLYNOMIALS changes. In those situations, the model’s accuracy decreased
significantly. Since the images of those situations have a very
Polynomial Degree Acc FP FN LPD
different structure (e.g., the car is not heading towards the
1st 96.22% 0.0393 0.0367 1.512
2nd 97.25% 0.0191 0.0175 1.116
road direction), the low amount of samples in this situation
3rd 97.84% 0.0016 0.0014 0.732 may have not been enough for the model to learn it.
4th 98.00% 0.0000 0.0000 0.497
5th 98.03% 0.0000 0.0000 0.382 VI. C ONCLUSION
In this work, a novel method for lane detection based
gains are not significant nor consistent in our experiments. on deep polynomial regression was proposed. The proposed
In addition, it is more computationally expensive (i.e., lower method is simple and efficient while maintaining competitive
FPS, higher MACs, and longer training times). In regards to accuracy when compared to state-of-the-art methods. Although
the input size, reducing it also means reducing the accuracy, works with state-of-the-art methods with slightly higher ac-
as expected. In some cases, this accuracy loss may not be curacy exist, most do not provide source code to replicate
significant, but the speed gains may be. For example, using an their results, therefore deeper investigations on differences
input size of 480×270 decreased the accuracy by only 0.55 between methods are difficult. Our method, besides being
p.p., but the model MACs decreased by 1.82 times. computationally efficient, will be publicly available so that
future works on lane markings detection have a baseline to
TABLE IV start work and for comparison. Furthermore, we’ve shown
A BLATION S TUDY RESULTS ON T U S IMPLE VALIDATION S ET problems on the metrics used to evaluate lane markings detec-
W. R . T BACKBONE AND I NPUT S IZE tion methods. For future works, metrics that can be used across
different approaches to lane detection (e.g., segmentation) and
Modification Acc FP FN MACs (G)
that better highlights flaws in lane detection methods can be
ResNet-34 88.07% 0.2267 0.1953 17.154 explored.
ResNet-50 83.37% 0.3472 0.3122 19.135
Backbone
EfficientNet-b1 89.20% 0.2170 0.1785 2.583
EfficientNet-b0 88.62% 0.2237 0.1844 1.748 ACKNOWLEDGMENT
320×180 85,45% 0.2924 0.2446 0.396 This study was financed in part by the Coordenaco de
Input Size 480×270 88.39% 0.2398 0.1960 0.961
640×360 88.62% 0.2237 0.1844 1.748 Aperfeioamento de Pessoal de Nvel Superior - Brasil (CAPES)
- Finance Code 001, Conselho Nacional de Desenvolvimento
As to the other ablation studies we carried out, one can see Cientfico e Tecnolgico (CNPq, Brazil), PIIC UFES and Fun-
that sharing the top-y (h) is slightly better than not sharing. dao de Amparo Pesquisa do Esprito Santo - Brasil (FAPES)
Moreover, training from a model pretrained on ImageNet grant 84412844. We thank NVIDIA for providing GPUs used
seems to have a significant impact on the final result, as in this research.
shown by the difference of 4.26 p.p. The same happens with
R EFERENCES
data augmentation, as the model trained with more data has a
significantly higher accuracy. [1] C. Badue, R. Guidolini, R. V. Carneiro, P. Azevedo, V. B. Cardoso,
A. Forechi, L. Jesus, R. Berriel, T. Paixão, F. Mutz et al., “Self-driving
Cars: A Survey,” arXiv preprint arXiv:1901.04407, 2019.
TABLE V [2] L. C. Possatti, R. Guidolini, V. B. Cardoso, R. F. Berriel, T. M.
A BLATION S TUDY RESULTS ON T U S IMPLE VALIDATION S ET Paixão, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “Traffic light
recognition using deep learning and prior maps for autonomous cars,”
Modification Acc FP FN in 2019 International Joint Conference on Neural Networks (IJCNN).
IEEE, 2019, pp. 1–8.
No 88,43% 0.2126 0.1783
Top-Y Sharing [3] P. Yang, G. Zhang, L. Wang, L. Xu, Q. Deng, and M.-H. Yang, “A part-
Yes 88.62% 0.2237 0.1844
aware multi-scale fully convolutional network for pedestrian detection,”
None 84,37% 0.3317 0.2826 IEEE Transactions on Intelligent Transportation Systems, 2020.
Pretraining [4] D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser,
ImageNet 88.62% 0.2237 0.1844
F. Timm, W. Wiesbeck, and K. Dietmayer, “Deep multi-modal object
None 78.63% 0.4188 0.4048 detection and semantic segmentation for autonomous driving: Datasets,
Data Augmentation 10× 88.62% 0.2237 0.1844 methods, and challenges,” IEEE Transactions on Intelligent Transporta-
tion Systems, 2020.
[5] J. C. McCall and M. M. Trivedi, “Video Based Lane Estimation and
Qualitative Evaluation. A sample of the qualitative results Tracking for. Driver Assistance: Survey, System, and Evaluation,” IEEE
on ELAS and LLAMAS are shown in Figure 3. For more Transactions on Intelligent Transportation Systems, vol. 7, no. 1, pp.
extensive results, videos are available4 . The results show that 20–37, 2006.
[6] R. F. Berriel, E. de Aguiar, A. F. De Souza, and T. Oliveira-Santos,
transfer learning works well on PolyLaneNet, since a smaller “Ego-Lane Analysis System (ELAS): Dataset and Algorithms,” Image
number of epochs was enough to obtain reasonable results and Vision Computing, vol. 68, pp. 64–75, 2017.
[7] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial As Deep:
4 Qualitative results (videos) on ELAS/LLAMAS: https://www.youtube. Spatial CNN for Traffic Scene Understanding,” in Thirty-Second AAAI
com/playlist?list=PLm8amuguiXiJ2zKvcapUJI ybyOFi9yz9 Conference on Artificial Intelligence, 2018.
Fig. 3. Qualitative results of PolyLaneNet on ELAS (top row) and LLAMAS (bottom row).
[8] X. Li, J. Li, X. Hu, and J. Yang, “Line-CNN: End-to-End Traffic Line Networks,” in Proceedings of the IEEE Conference on Computer Vision
Detection With Line Proposal Unit,” IEEE Transactions on Intelligent and Pattern Recognition (CVPR) Workshops, 2016, pp. 38–45.
Transportation Systems, vol. 21, no. 1, pp. 248–258, 2019. [16] TuSimple. TuSimple Benchmark. [Online]. Available: https://github.
[9] Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning Lightweight Lane com/TuSimple/tusimple-benchmark
Detection CNNs by Self Attention Distillation,” in Proceedings of the [17] J. Philion, “FastDraw: Addressing the Long Tail of Lane Detection by
IEEE International Conference on Computer Vision (ICCV), 2019, pp. Adapting a Sequential Prediction Network,” in Proceedings of the IEEE
1013–1021. Conference on Computer Vision and Pattern Recognition (CVPR), 2019,
[10] K. Kluge and S. Lakshmanan, “A deformable-template approach to lane pp. 11 582–11 591.
detection,” in Proceedings of the Intelligent Vehicles Symposium. IEEE, [18] W. Van Gansbeke, B. De Brabandere, D. Neven, M. Proesmans, and
1995, pp. 54–59. L. Van Gool, “End-to-end lane detection through differentiable least-
[11] K.-Y. Chiu and S.-F. Lin, “Lane Detection using Color-based Segmen- squares fitting,” arXiv preprint arXiv:1902.00293, 2019.
tation,” in Proceedings Intelligent Vehicles Symposium. IEEE, 2005, [19] K. Behrendt and R. Soussan, “Unsupervised labeled lane marker dataset
pp. 706–711. generation using maps,” in Proceedings of the IEEE International
[12] C. R. Jung and C. R. Kelber, “Lane Following and Lane Departure Using Conference on Computer Vision (ICCV), 2019.
a Linear-Parabolic Model,” Image and Vision Computing, vol. 23, no. 13, [20] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for con-
pp. 1192–1202, 2005. volutional neural networks,” in Proceedings of the 36th International
[13] R. F. Berriel, E. de Aguiar, V. V. de Souza Filho, and T. Oliveira-Santos, Conference on Machine Learning (ICML), 2019, pp. 6105–6114.
“A Particle Filter-based Lane Marker Tracking Approach Using a Cubic [21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
Spline Model,” in 28th SIBGRAPI Conference on Graphics, Patterns and A large-scale hierarchical image database,” in Proceedings of the IEEE
Images. IEEE, 2015, pp. 149–156. Conference on Computer Vision and Pattern Recognition (CVPR), 2009,
[14] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, pp. 248–255.
M. Andriluka, P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, F. Mujica, [22] R. K. Satzoda and M. M. Trivedi, “On Performance Evaluation Metrics
A. Coates, and A. Y. Ng, “An empirical evaluation of deep learning on for Lane Estimation,” in International Conference on Pattern Recogni-
highway driving,” arXiv preprint arXiv:1504.01716, 2015. tion (ICPR). IEEE, 2014, pp. 2625–2630.
[15] A. Gurghian, T. Koduri, S. V. Bailur, K. J. Carey, and V. N. Murali, [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
“DeepLanes: End-To-End Lane Position Estimation using Deep Neural recognition,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2016, pp. 770–778.