Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views14 pages

Improved Deep Network for Self-Driving Scene Classification

The document discusses an improved deep learning method for scene classification in self-driving cars. It proposes extracting local features from representative objects using an improved Faster RCNN network and extracting global features using an improved Inception module. The features are then fused to classify scenes. The method is tested on a custom dataset and achieves 94.76% accuracy, outperforming other methods.

Uploaded by

sagar patole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views14 pages

Improved Deep Network for Self-Driving Scene Classification

The document discusses an improved deep learning method for scene classification in self-driving cars. It proposes extracting local features from representative objects using an improved Faster RCNN network and extracting global features using an improved Inception module. The features are then fused to classify scenes. The method is tested on a custom dataset and achieves 94.76% accuracy, outperforming other methods.

Uploaded by

sagar patole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

71, 2022 5001614

An Improved Deep Network-Based Scene


Classification Method for Self-Driving Cars
Jianjun Ni , Senior Member, IEEE, Kang Shen , Yinan Chen , Weidong Cao , Member, IEEE,
and Simon X. Yang , Senior Member, IEEE

Abstract— A self-driving car is a hot research topic in the congestion, frequent traffic accidents, and other problems, the
field of the intelligent transportation system, which can greatly intelligent transportation system comes into being [1]–[3],
alleviate traffic jams and improve travel efficiency. Scene classi- which combines various advanced information technologies
fication is one of the key technologies of self-driving cars, which
can provide the basis for decision-making in self-driving cars. with the whole ground traffic management system to achieve
In recent years, deep learning-based solutions have achieved good efficient, convenient, and safe traffic control [4], [5].
results in the problem of scene classification. However, some The self-driving car is a vehicle that perceives the envi-
problems should be further studied in the scene classification ronment and runs with little or no manual input [6], [7],
methods, such as how to deal with the similarities among different which is an indispensable part of the intelligent transportation
categories and the differences among the same category. To deal
with these problems, an improved deep network-based scene system. Because the driverless system relies on the results of
classification method is proposed in this article. In the proposed environment perception to make driving behavior decisions,
method, an improved faster region with convolutional neural environment perception has become the research hot spot in
network features (RCNN) network is used to extract the features the self-driving car field [8]. The specific tasks of environment
of representative objects in the scene to obtain local features, perception in the field of self-driving include scene classifica-
where a new residual attention block is added to the Faster RCNN
network to highlight local semantics related to driving scenarios. tion, obstacle detection, lane recognition, and so on [9], [10].
In addition, an improved Inception module is used to extract Scene classification is one of the most important and challeng-
global features, where a mixed Leaky ReLU and ELU function ing tasks in the self-driving car field because the environment
is presented, to reduce the possible redundancy of the convolution of the traffic is complicated and volatile, and the categories
kernel and enhance the robustness. Then, the local features and are various [11].
the global features are fused to realize the scene classification.
Finally, a private dataset is built from the public datasets for the The scene classification of the self-driving car refers to the
specialized application of scene classification in the self-driving information of the road and its surroundings is obtained by the
field, and the proposed method is tested on the proposed dataset. onboard camera, radar, or other sensors, and then, the state
The experimental results show that the accuracy of the proposed of the current position is recognized by the corresponding
method can reach 94.76%, which is higher than the state-of-the- processing methods [12], [13]. To achieve a higher level of
art methods.
intelligent driving, the self-driving car needs to understand
Index Terms— Deep network, faster region with convolutional the high-level semantic information of its location to make the
neural network features (RCNN), feature fusion, scene classifi- decision of driving strategy and path planning. For example,
cation, self-driving car.
the car should slow down near the school, pay attention
to the use of anti-skid mode/function in rainy and snowy
I. I NTRODUCTION weather, keep driving at high speed on the highway, and so

W ITH the acceleration of urbanization and the rapid


development of the social economy, the number of
cars continues to increase, and the transportation situation
on [14], [15].
At the beginning of the research on scene classification,
most of the existed methods were based on the underlying
becomes more and more complex. To solve urban traffic visual features. For example, Vailaya et al. [16] used the
low-level visual features to generate a series of semantic tags
Manuscript received August 10, 2021; revised January 6, 2022; accepted
January 12, 2022. Date of publication January 27, 2022; date of current version to train the binary Bayesian classifiers, which is effective to
February 25, 2022. This work was supported in part by the National Natural solve the problem of content-based image scene recognition.
Science Foundation of China under Grant 61873086 and Grant 61903123, However, a single low-level visual feature is difficult to
and in part by the Natural Science Foundation of Jiangsu Province under
Grant BK20190165. The Associate Editor coordinating the review process was represent complex scene visual content, and the accuracy
Lei Zhang. (Corresponding author: Jianjun Ni.) of scene classification is low. Latte et al. [17] presented a
Jianjun Ni, Kang Shen, Yinan Chen, and Weidong Cao are with the College methodology by fusing color features and texture features to
of Internet of Things Engineering, Hohai University, Changzhou, Jiangsu
213022, China (e-mail: [email protected]; [email protected]; recognize certain crop field images. The method of low-level
[email protected]; [email protected]). visual feature fusion can improve the scene classification
Simon X. Yang is with the Advanced Robotics and Intelligent accuracy, but it is difficult to recognize the images outside the
Systems (ARIS) Laboratory, School of Engineering, University of Guelph,
Guelph, ON N1G 2W1, Canada (e-mail: [email protected]). training set accurately. Quelhas et al. [18] applied the bag-of-
Digital Object Identifier 10.1109/TIM.2022.3146923 words to scene recognition based on local invariant features
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
5001614 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

is proposed. As we know, the scenes of self-driving cars have


a strong correlation with the objects in these scenes, where the
objects mainly refer to those representative objects of the usual
traffic scenes. For example, the pedestrians and zebra crossings
are often contained in the crosswalk scenes. Multiple refueling
tanks often exist in gas stations. However, the scene category
cannot be decided simply by the representative objects existing
in the scene. Based on this idea, in the proposed method, the
local features of the representative objects in the scene and
the global features of the whole scene image are extracted
and fused to realize scene classification accurately.
The main contributions of this article are given as follows.
1) An improved deep network model is proposed for the
scene classification of self-driving cars. The network
combines local features and global features of the whole
Fig. 1. Some examples of the scenes in self-driving cars difficult to recognize:
(a) street and (b) crosswalk that are different scenes with similar objects; scene to improve the accuracy of classification.
(c) and (d) parking lots that are under different situations. 2) The model extracts one or more objects with discrim-
inative features in the image by using the pretrained
faster region with CNN features (RCNN) network. In the
and probabilistic latent space models, which can improve proposed model, the Faster RCNN network is improved
the generalization ability of the scene recognition method. by adding a residual connection module based on spatial
However, the influence of synonyms and polysemous has not attention to help the network pay attention to more
been considered in the bag-of-words, so this method cannot details and retain more discriminative features.
satisfy the requirement of self-driving cars. 3) The model uses Inception_V1 to extract the global fea-
Recently, deep learning methods, especially convolutional tures of the whole image, where the activation function
neural network (CNN)-based methods, have achieved good of the Inception_V1 is replaced by a mixed function with
results in many fields, including image processing and speech ELU and Leaky ReLU functions to improve the conver-
recognition [19]–[22]. More and more attention has been paid gence and accuracy of the network. In addition, a special
to scene classification based on deep learning technology. For dataset for the scene classification of self-driving cars is
example, Chen et al. [23] proposed a road scene recognition set up based on the public datasets of image classifica-
method based on a multilabel neural network. This network tion, and various experiments are conducted to test the
architecture integrates different classification patterns into the performance of the proposed method.
training cost function. Zheng and Naji [24] proposed a deep This article is organized as follows. Section II describes
learning neural network for road scene recognition, which the proposed method and gives out the structure of the pro-
can improve the accuracy of the semantic perception of the posed deep network. Experiments for the scene classification
road scenes in three dimensions, namely, time period, weather, of self-driving cars in various situations are conducted in
and road type. Tang et al. [25] used the GoogLeNet model Section III. Section IV discusses the performance of the
for scene recognition, which is divided into three layers proposed method and the effectiveness of the presented dataset
from bottom to top. The output features of each layer are with some additional comparison experiments. Finally, the
fused to generate the final decision of scene recognition. conclusion is given in Section V.
Wang et al. [26] presented a multiresolution CNN network
model, including the coarse resolution CNN and the fine II. P ROPOSED D EEP N ETWORK
resolution CNN, which are used to capture the visual structure In this article, a new deep network is proposed to deal
at large scale and relatively small scale, respectively. with the problem of the scene classification for self-driving
The methods introduced above provide a good foundation cars, which is shown in Fig. 2. The proposed deep network
for the scene classification of self-driving cars. However, includes four main parts: the improved Faster RCNN, the
two main difficulties need to be further studied in the scene improved Inception_V1 module, the feature fusion module,
recognition of self-driving cars. The first one is that scenes of and the classification network. In this study, the improvement
the same category differ greatly. The second one is that there of the Faster RCNN is that a residual connection module based
are visual similarities among different categories of scenes. on spatial attention is added into the structure of the deep
The main reason for these two difficulties is that there are network. The improvement of the Inception_V1 module is that
a variety of objects in the scene, which will influence the a mixed function with ELU and Leaky ReLU functions is used
recognition of the scene. For example, the same scene may as the activation function.
have different forms of expression if the objects in the scene As shown in Fig. 2, the first two units are used to extract
have obvious rotation or shadows. Some examples of these local features and global features, respectively. The improved
scenes difficult to recognize are shown in Fig. 1. Faster RCNN is pretrained, and its output result is the features
To deal with these problems of the scene classification for of the representative objects contained in the image. These
self-driving cars, an improved deep learning-based method representative objects are defined in advance according to
NI et al.: IMPROVED DEEP NETWORK-BASED SCENE CLASSIFICATION METHOD FOR SELF-DRIVING CARS 5001614

Fig. 2. Total structure of the proposed deep network for the scene classification of self-driving cars.

Fig. 3. Representative objects defined for the scene classification of self-driving cars. (a) Zebra crossings and pedestrians in the crosswalk. (b) Gas tanks in
the gas station. (c) Parking cars and lines in the parking lot. (d) Houses in street. (e) Isolation belts in the highway.

common sense and are used as labels for network training. A. Improved Faster RCNN Network for Local Feature
In this study, a total of seven representative objects of the Extraction
usual traffic scenes are defined, which are shown in Fig. 3. 1) Structure of the Improved Faster RCNN Network: The
The representative objects are zebra crossing, pedestrian, gas local feature extraction is based on the Faster RCNN [27].
tank, parking car, parking line, house, and isolation belt. The The main reason for using the Faster RCNN is that the
scenes defined in this study are crosswalk, gas station, parking performances of the Faster RCNN series are significantly
lot, street, and highway. In the proposed method, the prede- better than other networks (see [28] for details). In this study,
fined representative objects are automatically detected by the the structure of the improved Faster RCNN is shown in
improved Faster RCNN. There is no need to artificially decide Fig. 4, where the VGG16 Net is used as the underlying
what objects should be detected during the scene classification framework to get the feature map of the whole image, which
process. The output of the improved Inception_V1 network consists of 13 convolution layers and four pooling layers,
is the global features of the whole image. The feature fusion and the activation function is ReLU. The residual attention
module fuses local features and global features. These network module combines the top–down attention map with the input
structures are described in detail as follows. bottom–up convolution features, to obtain the feature map and
5001614 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Fig. 4. Structure of the improved Faster RCNN used in this article.

output to the next layer. The details about the residual attention pi represents the probability that the i th anchor is predicted
module will be introduced in Section II-A2. as the target, and pi∗ is the ground truth, namely,
In the proposed deep network, the region proposal network 
(RPN) [29] is used to generate region proposals, where one ∗ 0, Negative label
pi = (2)
branch is used to judge whether the anchor belongs to the 1, Positive label.
foreground or background, and the other branch is used to form
The RPN classification loss (L cls ) is a cross-entropy, which
the bounding box coordinates. RPN generates nine anchors for
is denoted by
each pixel in the feature map and sorts the anchors from large
     
to small according to the score of the input foreground. Then, L cls pi , pi∗ = − log pi∗ pi + 1 − pi∗ (1 − pi ) . (3)
the first 12 000 anchors are selected. Finally, only 2000 anchors
are reserved based on the none-maximum suppression (NMS) The RPN location regression loss (L reg ) is denoted by
algorithm (see [30] for details), which are input into the next  
L reg = R ti − ti∗ (4)
layer.
The region-of-interest (ROI) pooling network [31] maps the where R(·) is the smooth L1 loss function, which is defined
input ROIs to the last layer of the VGG16 network to get as
a proposal feature map with a fixed size (that is, 300 ∗ 7 ∗ 
0.5x 2 , if |x| < 1
7 ∗ 512 in this study). Finally, this fixed size feature map is R(x) = (5)
fully connected by the predication network, and the Softmax |x| − 0.5, otherwise.
is used to classify the specific categories. At the same time, The loss function of the predication network is the same as
the smooth L1 loss function is used to complete the bounding that of the RPN network, where the classification loss function
box regression operation to obtain the accurate positions of uses the cross-entropy and the regression loss function uses the
the objects (see [32] for details). smooth L1 loss function.
In this article, the loss function of the improved Faster 2) Residual Attention Module: The existing methods of
RCNN has four parts: the RPN classification loss, the RPN image scene classification mainly focus on the multilayer
location regression loss, the classification loss, and the location CNNs, but the large amount of redundant information con-
regression loss of the predication network. The loss function tained in images is not conducive to scene classification. These
of the RPN network L RPN is defined as follows: CNN-based methods do not clearly distinguish between key
1    information and redundant information; thus, the efficiency
L RPN ({ pi }, {ti }) = L cls pi , pi∗
Ncls i and accuracy of image scene classification are affected, and
1  ∗   the ability to extract features is limited. The spatial attention
+λ pi L reg ti , ti∗ (1) mechanism is widely used in visual tasks [35], which can
Nreg i
adaptively learn to focus on more prominent regional feature
where Ncls is the number of the anchors in the minibatch; maps in the scene and input these feature maps into the
Nreg is the number of the anchor locations; ti = {tx , t y , tw , th } subsequent bottom–up feature extraction process. However,
is the predicted coordinates of bounding box for the i th anchor; the spatial attention mechanism will lose the previous feature
ti∗ = {tx∗ , t y∗ , tw∗ , th∗ } is the real coordinates of the bounding maps, so the bottom–up process is interrupted by the attention
box of the objects; and λ is a coefficient to balance the model.
classification loss and location regression loss, which is an To deal with the problem introduced above, a single-layer
insensitive parameter and set as 1 in this study [33], [34]. spatial attention model with residual connection is proposed
NI et al.: IMPROVED DEEP NETWORK-BASED SCENE CLASSIFICATION METHOD FOR SELF-DRIVING CARS 5001614

Fig. 5. Structure of the proposed residual attention module in this article.

to integrate the attention map and convolution feature map. where Z add represents the fusion feature tensor of single
In the proposed residual attention module (see Fig. 5), the channels and X i represents the single-channel feature tensor
input feature map is first normalized by batch normalization of the i th target region (the total number of channels is 512 in
and operated by a single 1 ∗ 1 convolution layer. The 1 ∗ 1 con- this study).
volution layer can be used in general to change the filter space Finally, the local fusion feature vector Z is operated by a
dimensionality (see [36] and [37] for details). The purpose of flat layer and two fully connected layers, namely,
the 1 ∗ 1 convolution layer used here is to reduce the number
of channels and improve the calculation performance. After X p = ϕ p (X)
 
the convolution layer, based on the spatial attention module, X f = ϕf Xp (9)
the attention mask is generated, and different weights are given
to different regions of the feature map to get a new feature where ϕ p represents the flat operation; X p denotes the output
map. Then, the feature value of any point (i, j ) in the feature of the flat layer, which is a flat tensor; ϕ f represents a
map processed by the spatial attention mechanism is two-layer fully connected operation; and X f is the final local
feature, with the size of 1 ∗ 5 in this study.
Foutput (i, j ) = Finput (i, j ) ⊗ ai j (6) The pseudocode of the local feature extraction process based
where Foutput and Finput are the output and input feature on the proposed network is shown in Fig. 6.
values of the point (i, j ) through the spatial attention module,
respectively; ai j is the attention weight of the point (i, j ); and B. Improved Inception_V1 Network for Global Feature
⊗ represents the dot product operation. Extraction
To avoid the disappearance of feature value before the
attention module, a residual connection is introduced; then, In this article, a global feature extraction network is pro-
(6) is modified as posed based on the Inception network [40]. The main reason
for using the Inception network is that it can extract more
Foutput (i, j ) = Finput (i, j ) ⊗ ai j + Finput (i, j ). (7) information from input images at different scales by using
Remark 1: Based on the proposed residual attention module, a global average pooling layer instead of a fully connected
the feature map with spatial attention and convolution is layer, which can greatly reduce the number of parameters
combined with the input feature map. Thus, the bottom–up while increasing the depth of the network, and has a good
feature extraction process will not be interrupted, and the top performance in the classification problem. In this study, the
information of the image will also be taken into account. Inception_V1 network is improved for the global feature
3) Local Feature Extraction: The detailed process of the extraction, and its structure is shown in Fig. 7.
local feature extraction based on the proposed Faster RCNN As shown in Fig. 7, the Inception_V1 network has nine
is introduced as follows. First, based on the pretrained local Inception blocks in total, and each Inception block has four
network, 300 region proposals are generated, and a further branches. The first branch performs a 1 ∗ 1 convolution on the
NMS algorithm is applied to them (the threshold for the input, which can proceed cross-channel feature transformation
NMS is 0.3 in this study). Then, the target bounding box to improve the expression ability of the network; the second
with confidence greater than 0.5 is detected by the crop layer, branch first uses a 1 ∗ 1 convolution and then performs a
which is converted to the position coordinates [y1, x1, y2, x2] 3 ∗ 3 convolution; the third branch uses a 1 ∗ 1 convolution
in the original feature map (the feature map after the attention and then performs a 5 ∗ 5 convolution; and the fourth
module). The parts in the bounding box are extracted from the branch uses a 1 ∗ 1 convolution after a 3 ∗ 3 max-pooling
original feature map and resized uniformly by the following directly. Each Inception block uses an aggregation operation
pooling layer to obtain N local feature maps with the same to combine these four branches. Besides the Inception blocks,
size (it is 7 ∗ 7 ∗ 512 in this article). Finally, the N local there are three convolution layers and two max-pooling layers
feature maps are fused by an elementwise addition operation after the input layer. In addition, there are an average pooling
(see [38] and [39] for details), which is denoted as follows: layer and a fully connected layer before the output layer in
the Inception_V1 network.

N
Z add = Xi (8) As we know, the activation function is a key part of the
i=1 deep network, which is used to realize the nonlinear mapping
5001614 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

the loss of information and alleviate the zero gradient problem,


and ELU can relieve gradient disappears and is more stable to
the input change. The average output of ELU is close to zero,
so the convergence speed is faster.
The expression of Leaky ReLU is given as follows:

x i , if x i ≥ 0
yi = (10)
αx i , if αx i < 0
where α is a fixed parameter, which is set as 0.2 in this article.
The expression of ELU is given as follows:

xi , if x i ≥ 0
yi = (11)
e xi − 1, if x i < 0.
After the convolution and Inception network, the average
pooling and fully connected operation proceed, which outputs
the final global feature map with the size of 1 ∗ 5 in this study.
Remark 2: Based on the proposed mixed activation function
method of Leaky ReLU and ELU, the problem of information
loss and slow convergence can be solved efficiently.

C. Feature Fusion and Classification Network


The last part of the proposed deep network is the feature
fusion and the classification network. The fusion method in
this article is given as follows.
Suppose that the two input feature vectors are X =
[α1 , α2 , . . . , α N ] and Y = [β1 , β2 , . . . , β N ]. When the batch
size is 1, the fused feature vector Z cat is obtained by appending
Fig. 6. Pseudocode of the local feature extraction process based on the the two input feature vectors X and Y , namely,
proposed network.
Z cat = [α1 , α2 , . . . , α N , β1 , β2 , . . . , β N ] (12)
where the size of the Z cat is 1 ∗ 2N, namely, 1 ∗ 10 in this
study. Then, the fused feature vector is sent into the fully con-
nected layer to train the deep network for scene classification.
After the fully connected layer, the size of the feature
vector is changed to 1 ∗ k, where k is the number of scene
classifications in the experiment (which is 5 in this study).
The classification training is carried out through Softmax, and
the loss function used in this article is a cross-entropy loss
function [41]. The loss function in this article is commonly
Fig. 7. Structure of the proposed global network based on Inception_V1.
used in image classification to ensure the maximum probability
of positive prediction, which is given as follows:
 k
for feature extraction. In the general Inception block, the 1  1  
Losscls = Li = − yic log( pic ) (13)
ReLU function is used as the activation function, which has M i M i c=1
some defects, such as information loss and neuron death.
To deal with the problem of the loss of feature information and where k is the number of categories, namely, the number of
improve the rate of convergence, a mixed activation function scene classification; M is the number of the samples; pic is
method is presented in this article. the probability that the i th observation sample belongs to the
The mixed activation function method is that the Leaky cth category; and yic denotes the indicator variable of the i th
ReLU and ELU functions are used alternately as the activation observation sample and the cth category, which is defined as
function of the convolution layer for the Inception networks. follows:

In this study, the Leaky ReLU is used as the activation function 1, if the i -th sample is the c-th category
of the initial convolution layer of the network first, and then, yic = (14)
0, otherwise.
the ELU function is used alternately. The main reasons for
using the Leaky ReLU and ELU functions are that Leaky The whole workflow of the proposed method is summarized
ReLU uses a small slope instead of the negative axis to reduce as follows.
NI et al.: IMPROVED DEEP NETWORK-BASED SCENE CLASSIFICATION METHOD FOR SELF-DRIVING CARS 5001614

Fig. 8. Some images with different scenes in the special dataset built for scene classification of self-driving cars in this article.

1) The scene image to be classified is input into the are used directly for scene classification in self-driving cars.
proposed deep network. To deal with this problem, a special dataset is established
2) Generate 300 region proposals for the image by the to train and verify the performance of the deep network
pretrained improved Faster RCNN. Then, the local fea- in the scene classification of self-driving cars. There are
tures of the image are extracted by performing some five categories of the scene in the proposed dataset, namely,
subsequent operations on these 300 region proposals, crosswalk, gas station, parking lot, highway, and street. Each
such as NMS, flat, and two-layer fully connection category has 15 000 pictures that are selected from the public
[see (8) and (9)]. datasets KITTI [42] and Place365 [26].
3) Meanwhile, the global features of the image are The proposed dataset contains various traffic scenes at
extracted by the improved Inception_V1 network. different locations, time periods, and various light and weather
4) Then, the local features and global features are fused conditions, such as day and night, and cloudy and sunny days.
by (12) to obtain the fused feature vector. Because the scenes are to be classified by a self-driving car,
5) Send the fused feature vector to a fully connected layer, the images should be obtained by the cameras mounted on
and the size of the feature vector is changed to 1 ∗ 5. the car. Thus, when selecting the images, their availability
6) Conduct scene classification through Softmax classifier, should be fully considered, including the shooting angle,
and finally, output the scene category that the image shooting distance, and representative objects in the images.
belongs to. The workload to establish this special dataset is enormous,
which is to ensure the performance of the deep network based
III. E XPERIMENTS on this dataset. Some images with different scenes of the
self-driving cars are shown in Fig. 8.
A. Datasets Remark 3: In this study, the special dataset is set up
There are many public datasets widely used for image by selecting pictures with clear and distinct representative
classification training and testing, such as KITTI and objects of scene images from the public dataset because the
UMC [42], [43]. However, these datasets are not set up for self-driving car will only be in one scene at the same time in
scene classification of self-driving cars especially. Thus, the most cases. Other complex situations will be further studied
accuracy and efficiency will be low if these public datasets in the future.
5001614 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Fig. 10. Curve of the loss function and accuracy of the proposed deep
network after 20 000 iterations on the training set.

Fig. 9. Summary of the structure for the proposed deep network.

TABLE I
PARAMETERS OF THE P ROPOSED D EEP N ETWORK
AND THE E XPERIMENTAL E NVIRONMENT

Fig. 11. Curve of the loss function and accuracy of the proposed deep
network after 20 000 iterations on the validation set.

B. Experimental Results of the Proposed Method


The experiments are conducted on a computer with
Windows 10 System, and the deep network is realized on the
TensorFlow deep learning framework with Python 3.6 [44].
A simple summary of the structure for the proposed deep
network is shown in Fig. 9, which is introduced in detail in
Section II. The parameters of the proposed deep network and
the experimental environment are listed in Table I. Fig. 12. Confusion matrix of the proposed deep network on the test set.
The proportion of training and test sets in the whole dataset
is 6:4, and the proportion of the train and validation set
in the training set is 1:1. The size of the input images matrix of the proposed deep network on the test set is shown in
to the deep network is 224 ∗ 224. The curve of the loss Fig. 12, which shows the scene classification accuracy for each
function and the accuracy of the proposed deep network after category of the proposed deep network. The results in Fig. 12
20 000 iterations on the training set and the validation set are show that the highway has the highest accuracy because the
shown in Figs. 10 and 11, respectively. features of the highway are most easy to extract. The street has
The results in Fig. 11 show that the accuracy of the the lowest accuracy because the streets are the most complex
validation set reaches the highest value to 95.04% when the scenes in self-driving cars. However, based on the proposed
number of training iterations is 19 000 and the accuracy of method, the local features and the global features are fused,
the training set reaches 100% (see Fig. 10). The confusion so the accuracy of the street can reach 93.66%.
NI et al.: IMPROVED DEEP NETWORK-BASED SCENE CLASSIFICATION METHOD FOR SELF-DRIVING CARS 5001614

TABLE II
E XPERIMENTAL R ESULTS OF S CENE C LASSIFICATION BASED ON D IFFERENT D EEP N ETWORKS

C. Comparison Experiments
To show the efficiency of the proposed method, some
comparison experiments are conducted, where some other
state-of-the-art deep learning-based methods are tested on the
same dataset used in Section III-B. These state-of-the-art deep
learning-based methods include MobileNet (with 14 Conv,
13 DW, one Pool, one FC, and one Softmax layers) [45],
ResNet101 (with 101 layers) [46], AlexNet (with eight lay-
ers) [47], EfficientNet (with 16 MBConv, two Conv, one Pool,
and one FC layers) [48], and Inception_V1 (with 22 layers;
see Fig. 7 for details) [49]. The main reason that these deep
learning methods are selected for comparative experiments is
that these methods are classic deep learning models in scene
classification with good performance. To test these methods
under different situations, the dataset is divided into three
parts, namely, sunny day, rainy day, and night. The comparison
results are listed in Table II. Some scene classification results
based on these methods are shown in Fig. 13.
The results in Table II show that the total accuracy of
the proposed method can reach 94.76% and increases 4.67%
(relative value) compared with the general Inception_V1,
which obtains the second-best result in this scene classification
experiment. In the experiment, all the state-of-the-art deep
learning-based methods can have a relatively high accuracy
of the scene classification on a sunny day. From the results
in Table II, we can also see that the classification accuracies
of all these methods decrease obviously on a rainy day and at
night because the feature extraction is difficult for the images
on a rainy day and night. However, the proposed deep network
can have higher accuracy on a sunny day, the rainy day, and at
night because the proposed deep network uses both the local
features and global features to realize the scene classification
(see the scene classification results in Fig. 13 for details).
The standard deviation of the accuracies on different situations
based on the proposed method is the smallest of these methods,
which also shows that the proposed deep network has good
performance in various situations.

IV. D ISCUSSION
The total performance of the proposed method has been
proven on the special dataset for self-driving cars by some
Fig. 13. Some scene classification results based on the six deep learning
experiments in Section III. In this section, some additional methods.
comparison experiments are conducted to discuss the perfor-
mance of the key parts of the proposed network, including global feature network based on Inception_V1. Based on these
the local feature network based on Faster RCNN and the experiments, not only the reasons why these deep networks
5001614 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

TABLE III
E XPERIMENTAL R ESULTS OF L OCAL F EATURE E XTRACTION

Fig. 14. Some representative object detection results.

are used in the proposed model but also the ablation analyses improved Faster RCNN and the other two networks are used
(including the attention module in the general Faster RCNN to detect all these representative objects of the input images.
and the mixed activation function with ELU and Leaky ReLU The comparison experiment results are listed in Table III, and
in the Inception_V1) are given out. In addition, the effective- some experimental results are shown in Fig. 14.
ness of the presented special dataset for the scene classification The results in Table III and Fig. 14 show that the two
of self-driving cars and the generalization performance of the Faster RCNN-based methods have better performance in the
proposed model are discussed by the comparison experiments detection task of representative objects than YOLOv5, which
conducted on a public dataset and some real-world traffic is the main reason why the proposed deep network uses
videos. Faster RCNN for the local feature extraction. Compared
with the general Faster RCNN, the improved Faster RCNN
A. About the Local Feature Extraction Network increases 1.31% in mAP and reduces 27.27% in the standard
First, the performance of the improved Faster RCNN for deviation (relative value). On a rainy day, the accuracy of
local feature extraction is discussed by a comparison experi- the improved Faster RCNN increases 2.58% relative to the
ment with the general Faster RCNN and YOLOv5 (the latest general Faster RCNN, where the effect is more obvious. The
version of the YOLO object detection algorithm [50]). In the experimental results show that the introduction of a residual
proposed deep network, the task of the Faster RCNN is to connection module based on spatial attention in the Faster
detect the representative objects of the scene images, including RCNN can help the network to extract the local features more
zebra crossings and pedestrians in crosswalks, and fuel tanks efficiently.
of gas stations (see Fig. 3). Thus, before the experiment, The results in Fig. 14 also show that the improved
each image is manually marked on the dataset. Then, the Faster RCNN in this article can finish the detection task of
NI et al.: IMPROVED DEEP NETWORK-BASED SCENE CLASSIFICATION METHOD FOR SELF-DRIVING CARS 5001614

TABLE IV
C OMPARISON E XPERIMENT R ESULTS BASED ON
D IFFERENT G LOBAL F EATURE N ETWORKS

representative objects efficiently in various situations. For


example, the improved Faster RCNN can detect pedestrians Fig. 15. Curve of the accuracy on the validation set based on the proposed
in the distance with a small scale (see the images of the first deep network with the Inception_V1 using different activation functions.
line in Fig. 14) and more gas tanks at the night (see the images
of the second line in Fig. 14). In the images of the third
line in Fig. 14, the improved Faster RCNN can detect both that the improvements of Faster RCNN and Inception_V1
the parking car and lines, and the detection score is higher in the proposed method are both important for the scene
than the general Faster RCNN. The results in Table III show classification.
that the improved Faster RCNN has higher accuracy in all To further show the efficiency of the mixed activation
the situations than the other two methods, and the standard function in the Inception_V1 network proposed in this study,
deviation of the improved Faster RCNN is less than the general some additional comparison experiments are conducted on the
Faster RCNN and YOLOv5, which shows that the stability of validation set, and the result of our method in Section III-B
the improved Faster RCNN is good. is used as a reference (see Fig. 11). In these experiments,
all the settings and structures are the same as the proposed
B. About the Global Feature Extraction Network deep network, except that the activation function used in
the Inception_V1 network is different. Here, other common
Another important part of the proposed deep network is the
activation functions (including Leaky ReLU, ELU, and ReLU)
global feature network based on Inception_V1. To show the
are compared with the mixed activation function used in
performance of the improved Inception_V1 in the scene clas-
the improved Inception_V1 network. The accuracies of these
sification of self-driving cars, some comparison experiments
experiments based on the proposed deep network with the
are conducted. In these experiments, various combinations of
Inception_V1 using different activation functions are shown
the global feature extraction networks and the improved Faster
in Fig. 15. The results in Fig. 15 show that the proposed
RCNN-based local feature extraction network are tested on the
mixed activation function can obtain the highest accuracy at a
special dataset. In these combinations, the total structure is the
relatively high speed.
same as the proposed deep network (see Fig. 2), except that
the global networks are different. The global feature extraction
networks used in these experiments include Inception_V1 [49], C. About the Special Dataset
Inception_V3 [51], MobileNet [45], ResNet101 [46], and In this study, a special dataset is set up by selecting pictures
the improved Inception_V1 presented in Section II-B. The with clear and distinct representative objects of scene images
comparison experiment results are listed in Table IV. from some public datasets to further improve the efficiency of
The results in Table IV show that the combination of the the deep learning-based method in self-driving cars. To verify
Inception_V1 achieves the second-best performance of all the the superiority of the presented dataset, a comparative exper-
previous methods except for the improved method in this iment is carried out with the public dataset BDD100k, which
article. It is the main reason why the Inception_V1 is used for is a driving-related dataset for heterogeneous multitask learn-
global feature extraction in the proposed deep network. The ing [52]. In addition, to verify the performance of the proposed
results of these comparison experiments also show that the method in practical application and test the generalization of
improved Inception_V1 has the best performance compared the proposed deep network model trained in the special dataset,
with other methods. Compared with the method using the some traffic videos obtained from vehicle data recorder are
general Inception_V1 for global feature extraction, our method also used in this experiment. Some images in these traffic
based on the improved Inception_V1 increases 2.49% in the videos are shown in Fig. 16. The experimental results are
accuracy of the scene classification (relative value), which listed in Table V, and the results of Section III-C are used
means that the improvement in the Inception_V1 module as a reference (see Table II).
is effective. According to the results in Tables II and IV, The results in Table V show that the accuracies of all the
we can see that the increase in the accuracy of the method tested deep networks decrease obviously on the public dataset
(Improved Faster RCNN + Inception_V1) is 2.13% compared BDD100k. The main reason is that the public dataset contains
with the general Inception_V1 (relative value), which means a large number of scenes unrelated to the classification task of
5001614 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Fig. 16. Some images in the traffic videos obtained from an automobile data recorder.

TABLE V article, the proposed deep learning-based method is based on


C OMPARISON E XPERIMENT R ESULTS OF THE S CENE C LASSIFICATION the features of the discriminating objects in the scene, which
BASED ON D IFFERENT D EEP N ETWORKS IN THE P UBLIC
D ATASET BDD100K AND T RAFFIC V IDEOS
are almost the same in different scenes for self-driving cars,
such as the pedestrians and zebra crossings for crosswalk
scenes. In addition, in the proposed method, the global features
of the whole image are fused with the local features of the
discriminating objects. Thus, the problem of generalization can
be solved to some extent. However, lots of works should be
done to further improve the generalization ability of the scene
classification model in the future.

V. C ONCLUSION
The scene classification for self-driving cars based on the
deep network is studied, and an improved integrated deep
network is presented in this article. In the proposed deep
network, the Inception network and Faster RCNN network are
this study. This means that there is a dataset offset between the used to extract global features and local features, respectively,
public dataset and the special dataset used to train the deep which are two of the main CNN models for visual computing
network model, which will seriously affect the performance with excellent performance. In the proposed deep network,
of scene classification based on deep learning methods. There these two networks are improved to increase accuracy and
are lots of images in the public dataset without considering computing efficiency. To further improve the efficiency of
the shooting angle and distance, especially the existence of the deep learning-based method in self-driving cars, a special
representative objects (such as the parking lots without parking dataset is set up based on some public datasets. In addition,
lines), which will reduce the scene classification accuracy. various comparison experiments are conducted, and the results
On the other hand, the result on the public dataset based show that the proposed deep network has a better performance
on the proposed method has the highest accuracy in all deep than the state-of-the-art deep networks in the scene classi-
networks, which shows that the proposed method has good fication task for self-driving cars. However, there are some
robustness. limitations of the proposed method, including the division
In the experiment of the real-world traffic videos, there are problem of the scene categories and the classification problem
some scenes that are common but difficult to recognize, includ- for some heterogeneous scenes, such as the roadside parking
ing the dimly lit underground parking lots and complicated lots and gas stations along the street. These problems should
internal roads (see Fig. 16). The proposed method can also get be further studied.
the best result, and its accuracy reaches 83.46%, which further In future work, the special dataset for self-driving should
shows that the proposed method has a better generalization be further studied, including the situations with hetero-
ability and can satisfy the requirement in the task of scene geneous road agents, to make it more suitable for the
classification for self-driving cars (see Table V). deep learning-based methods of scene classification in the
Remark 4: The generalization of the trained model for a self-driving car field. On the other hand, how to perform a
new test distribution is a very important problem, and there more nuanced division of the scene categories for self-driving
are many related works on this problem [53]–[55]. In this cars is a subject worthy of study. In addition, other deep
NI et al.: IMPROVED DEEP NETWORK-BASED SCENE CLASSIFICATION METHOD FOR SELF-DRIVING CARS 5001614

network models (such as VGG and AlexNet) will be further [23] L. Chen, W. Zhan, W. Tian, Y. He, and Q. Zou, “Deep integration:
studied to check their performance for different tasks in the A multi-label architecture for road scene recognition,” IEEE Trans.
Image Process., vol. 28, no. 10, pp. 4883–4898, Oct. 2019.
self-driving car field, including lane recognition and obstacle [24] K. Zheng and H. A. H. Naji, “Road scene segmentation based on deep
detection. learning,” IEEE Access, vol. 8, pp. 140964–140971, 2020.
[25] P. Tang, H. Wang, and S. Kwong, “G-MS2F: GoogLeNet based multi-
stage feature fusion of deep CNN for scene recognition,” Neurocomput-
R EFERENCES ing, vol. 225, pp. 188–197, Feb. 2017.
[26] L. Wang, S. Guo, W. Huang, Y. Xiong, and Y. Qiao, “Knowledge guided
[1] M. Huang, X. Yan, Z. Bai, H. Zhang, and Z. Xu, “Key technologies of disambiguation for large-scale scene classification with multi-resolution
intelligent transportation based on image recognition and optimization CNNs,” IEEE Trans. Image Process., vol. 26, no. 4, pp. 2055–2068,
control,” Int. J. Pattern Recognit. Artif. Intell., vol. 34, no. 10, Sep. 2020, Apr. 2017.
Art. no. 2054024. [27] C. Dai et al., “Video scene segmentation using tensor-train faster-RCNN
[2] A. Mammeri, T. Zuo, and A. Boukerche, “Extending the detection range for multimedia IoT systems,” IEEE Internet Things J., vol. 8, no. 12,
of vision-based vehicular instrumentation,” IEEE Trans. Instrum. Meas., pp. 9697–9705, Jun. 2021.
vol. 65, no. 4, pp. 856–873, Apr. 2016. [28] Z. Liu, Y. Lyu, L. Wang, and Z. Han, “Detection approach based on an
[3] M. Karaduman and H. Eren, “Smart driving in smart city,” in Proc. 5th improved faster RCNN for brace sleeve screws in high-speed railways,”
Int. Istanbul Smart Grid Cities Congr. Fair (ICSG), İstanbul, Turkey, IEEE Trans. Instrum. Meas., vol. 69, no. 7, pp. 4395–4403, Jul. 2020.
Apr. 2017, pp. 115–119.
[29] C. Peng, K. Zhao, and B. C. Lovell, “Faster ILOD: Incremental learning
[4] F. Duarte, “Self-driving cars: A city perspective,” Sci. Robot., vol. 4,
for object detectors based on faster RCNN,” Pattern Recognit. Lett.,
no. 28, Mar. 2019, Art. no. eaav9843, doi: 10.1126/scirobotics.aav9843.
vol. 140, pp. 109–115, Dec. 2020.
[5] N. A. Greenblatt, “Self-driving cars and the law,” IEEE Spectr., vol. 53,
[30] G. Han, J.-P. Su, and C.-W. Zhang, “A method based on multi-
no. 2, pp. 46–51, Feb. 2016.
convolution layers joint and generative adversarial networks for vehicle
[6] J. Ni, Y. Chen, Y. Chen, J. Zhu, D. Ali, and W. Cao, “A survey on
detection,” KSII Trans. Internet Inf. Syst., vol. 13, no. 4, pp. 1795–1811,
theories and applications for self-driving cars based on deep learning
2019.
methods,” Appl. Sci., vol. 10, no. 8, p. 2749, Apr. 2020.
[31] Y. Tian et al., “Lane marking detection via deep convolutional neural
[7] R. Hussain and S. Zeadally, “Autonomous cars: Research results, issues,
network,” Neurocomputing, vol. 280, pp. 46–55, Mar. 2018.
and future challenges,” IEEE Commun. Surveys Tuts., vol. 21, no. 2,
[32] Z.-H. Feng, J. Kittler, M. Awais, and X.-J. Wu, “Rectified wing
pp. 1275–1313, 2nd Quart., 2019.
loss for efficient and robust facial landmark localisation with convo-
[8] L. Jones, “Driverless when and cars: Where?” Eng. Technol., vol. 12,
lutional neural networks,” Int. J. Comput. Vis., vol. 128, nos. 8–9,
no. 2, pp. 36–40, 2017.
pp. 2126–2145, 2020.
[9] Q. Xu, M. Wang, Z. Du, and Y. Zhang, “A positioning algorithm of
autonomous car based on map-matching and environmental perception,” [33] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
in Proc. 33rd Chin. Control Conf. (CCC), Nanjing, China, Jul. 2014, real-time object detection with region proposal networks,” IEEE Trans.
pp. 707–712. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[10] Q. Zou, H. Jiang, Q. Dai, Y. Yue, L. Chen, and Q. Wang, “Robust lane [34] N. Yao, G. Shan, and X. Zhu, “Substation object detection based on
detection from continuous driving scenes using deep neural networks,” enhance RCNN model,” in Proc. 6th Asia Conf. Power Electr. Eng.
IEEE Trans. Veh. Technol., vol. 69, no. 1, pp. 41–54, Jan. 2020. (ACPEE), Chongqing, China, Apr. 2021, pp. 463–469.
[11] Y. Parmar, S. Natarajan, and G. Sobha, “DeepRange: Deep-learning- [35] M. Guo, D. Xue, P. Li, and H. Xu, “Vehicle pedestrian detection
based object detection and ranging in autonomous driving,” IET Intell. method based on spatial pyramid pooling and attention mechanism,”
Transp. Syst., vol. 13, no. 8, pp. 1256–1264, Aug. 2019. Information, vol. 11, no. 12, pp. 1–15, 2020.
[12] J.-R. Xue, J.-W. Fang, and P. Zhang, “A survey of scene understanding [36] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE
by event reasoning in autonomous driving,” Int. J. Autom. Comput., Conf. Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA,
vol. 15, no. 3, pp. 249–266, 2018. Jun. 2015, pp. 1–9.
[13] Y. Yang, F. Chen, F. Wu, D. Zeng, Y.-M. Ji, and X.-Y. Jing, “Multi-view [37] F. Chollet, “Xception: Deep learning with depthwise separable convo-
semantic learning network for point cloud based 3D object detection,” lutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Neurocomputing, vol. 397, pp. 477–485, Jul. 2020. Honolulu, HI, USA, Jul. 2017, pp. 1800–1807.
[14] H. Xu and G. Srivastava, “Automatic recognition algorithm of traffic [38] C. Zou and M. Wei, “Cluster-based deep convolutional networks for
signs based on convolution neural network,” Multimedia Tools Appl., spectral reconstruction from RGB images,” Neurocomputing, vol. 464,
vol. 79, nos. 17–18, pp. 11551–11565, May 2020. pp. 342–351, Nov. 2021.
[15] C. Shen et al., “Multi-receptive field graph convolutional neural net- [39] Y. Ge, Z. Yang, Z. Huang, and F. Ye, “A multi-level feature fusion
works for pedestrian detection,” IET Intell. Transp. Syst., vol. 13, no. 9, method based on pooling and similarity for HRRS image retrieval,”
pp. 1319–1328, Sep. 2019. Remote Sens. Lett., vol. 12, no. 11, pp. 1090–1099, Nov. 2021.
[16] A. Vailaya, M. A. T. Figueiredo, A. K. Jain, and H.-J. Zhang, “Image [40] I. Delibasoglu and M. Cetin, “Improved U-Nets with inception blocks
classification for content-based indexing,” IEEE Trans. Image Process., for building detection,” Proc. SPIE, vol. 14, no. 4, Nov. 2020,
vol. 10, no. 1, pp. 117–130, Jan. 2001. Art. no. 044512.
[17] M. V. Latte, S. Shidnal, B. S. Anami, and V. B. Kuligod, “A combined [41] X. Li, L. Yu, D. Chang, Z. Ma, and J. Cao, “Dual cross-entropy loss
color and texture features based methodology for recognition of crop for small-sample fine-grained vehicle classification,” IEEE Trans. Veh.
field image,” Int. J. Signal Process., Image Process. Pattern Recognit., Technol., vol. 68, no. 5, pp. 4204–4212, May 2019.
vol. 8, no. 2, pp. 287–302, Feb. 2015. [42] R. McCall et al., “A taxonomy of autonomous vehicle handover situa-
[18] P. Quelhas, F. Monay, J.-M. Odobez, D. Gatica-Perez, T. Tuytelaars, tions,” Transp. Res. A, Policy Pract., vol. 124, pp. 507–522, Jun. 2019.
and L. Van Gool, “Modeling scenes with local descriptors and latent [43] L. Zhang, L. Li, X. Pan, Z. Cao, Q. Chen, and H. Yang, “Multi-
aspects,” in Proc. 10th IEEE Int. Conf. Comput. Vis. (ICCV), vol. 1, level ensemble network for scene recognition,” Multimedia Tools Appl.,
Beijing, China, Oct. 2005, pp. 883–890. vol. 78, no. 19, pp. 28209–28230, Oct. 2019.
[19] G. Guo and N. Zhang, “A survey on deep learning based face [44] M. Liu and D. Grana, “Accelerating geostatistical seismic inversion
recognition,” Comput. Vis. Image Understand., vol. 189, Dec. 2019, using TensorFlow: A heterogeneous distributed deep learning frame-
Art. no. 102805. work,” Comput. Geosci., vol. 124, pp. 37–45, Mar. 2019.
[20] S. Ren, K. Sun, C. Tan, and F. Dong, “A two-stage deep learning method [45] W. Wang, Y. Hu, T. Zou, H. Liu, J. Wang, and X. Wang, “A new
for robust shape reconstruction with electrical impedance tomography,” image classification approach via improved MobileNet models with local
IEEE Trans. Instrum. Meas., vol. 69, no. 7, pp. 4887–4897, Jul. 2020. receptive field expansion in shallow layers,” Comput. Intell. Neurosci.,
[21] Z. Wang, K. Liu, J. Li, Y. Zhu, and Y. Zhang, “Various frameworks vol. 2020, pp. 1–10, Aug. 2020.
and libraries of machine learning and deep learning: A survey,” Arch. [46] S. Liu, G. Tian, and Y. Xu, “A novel scene classification model
Comput. Methods Eng., pp. 1–24, Feb. 2019, doi: 10.1007/s11831-018- combining ResNet based transfer learning and data augmentation with
09312-w. a filter,” Neurocomputing, vol. 338, pp. 191–206, Apr. 2019.
[22] E. Mutabazi, J. Ni, G. Tang, and W. Cao, “A review on medical textual [47] K. M. Hosny, M. A. Kassem, and M. M. Fouad, “Classification of skin
question answering systems based on deep learning approaches,” Appl. lesions into seven classes using transfer learning with AlexNet,” J. Digit.
Sci., vol. 11, no. 12, p. 5456, Jun. 2021. Imag., vol. 33, no. 5, pp. 1325–1334, 2020.
5001614 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

[48] H. Alhichri, A. S. Alswayed, Y. Bazi, N. Ammour, and N. A. Alajlan, Kang Shen received the B.S. degree from Hohai
“Classification of remote sensing images using EfficientNet-B3 CNN University, Changzhou, China, in 2020, where she is
model with attention,” IEEE Access, vol. 9, pp. 14078–14094, 2021. currently pursuing the M.S. degree with the Depart-
[49] R. K. Mohapatra, K. Shaswat, and S. Kedia, “Offline handwritten sig- ment of Detection Technology and Automatic Equip-
nature verification using CNN inspired by inception V1 architecture,” in ment, College of Internet of Things Engineering.
Proc. 5th Int. Conf. Image Inf. Process. (ICIIP), Solan, India, Nov. 2019, Her research interests include self-driving, robot
pp. 263–267. control, and machine learning.
[50] M. Kasper-Eulaers, N. Hahn, S. Berger, T. Sebulonsen, Ø. Myrland,
and P. E. Kummervold, “Short communication: Detecting heavy goods
vehicles in rest areas in winter conditions using YOLOv5,” Algorithms,
vol. 14, no. 4, p. 114, Mar. 2021.
[51] C. Wang et al., “Pulmonary image classification based on inception- Yinan Chen received the B.S. degree from Hohai
v3 transfer learning model,” IEEE Access, vol. 7, pp. 146533–146541, University, Changzhou, China, in 2018, where she
2019. is currently pursuing the M.S. degree in communi-
[52] F. Yu et al., “BDD100K: A diverse driving dataset for heterogeneous cation and information systems with the College of
multitask learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Internet of Things Engineering.
Recognit. (CVPR), Seattle, WA, USA, Jun. 2020, pp. 2633–2642. Her research interests include deep learning and
[53] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive image processing.
faster R-CNN for object detection in the wild,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 2018,
pp. 3339–3348.
[54] C. Chen, Z. Zheng, X. Ding, Y. Huang, and Q. Dou, “Harmonizing
transferability and discriminability for adapting object detectors,” in Weidong Cao (Member, IEEE) received the Ph.D.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, degree in mechanical engineering from Chongqing
WA, USA, Jun. 2020, pp. 8866–8875. University, Chongqing, China, in 2018.
[55] Y. Song, Z. Liu, J. Wang, R. Tang, G. Duan, and J. Tan, “Multiscale He is currently a Lecturer with the College
adversarial and weighted gradient domain adaptive network for data of Internet of Things Engineering, Hohai Uni-
scarcity surface defect detection,” IEEE Trans. Instrum. Meas., vol. 70, versity, Changzhou, China. His research interests
pp. 1–10, 2021. include swarm intelligence optimization algorithms
and data-driven modeling.

Simon X. Yang (Senior Member, IEEE) received


the B.Sc. degree in engineering physics from
Beijing University, Beijing, China in 1987, the M.Sc.
degree in biophysics from the Chinese Academy
of Sciences, Beijing, in 1990, the M.Sc. degree
in electrical engineering from the University of
Jianjun Ni (Senior Member, IEEE) received the Houston, Houston, TX, USA, in 1996, and the Ph.D.
Ph.D. degree from the School of Information and degree in electrical and computer engineering from
Electrical Engineering, China University of Mining the University of Alberta, Edmonton, AB, Canada,
and Technology, Xuzhou, China, in 2005. in 1999.
He was a Visiting Professor with the Advanced He is currently a Professor and the Head of the
Robotics and Intelligent Systems (ARIS) Laboratory, Advanced Robotics and Intelligent Systems Laboratory, University of Guelph,
University of Guelph, Guelph, ON, Canada, from Guelph, ON, Canada. His research interests include robotics, intelligent
November 2009 to October 2010. He is currently systems, sensors and multisensor fusion, wireless sensor networks, control
a Professor with the College of Internet of Things systems, transportation, and computational neuroscience.
Engineering, Hohai University, Changzhou, China. Dr. Yang has been very active in professional activities. He was the
He has published over 100 papers in related interna- General Chair of the 2011 IEEE International Conference on Logistics and
tional conferences and journals. His research interests include control systems, Automation and the Program Chair of the 2015 IEEE International Conference
neural networks, robotics, machine intelligence, and multiagent systems. on Information and Automation. He also serves as the Editor-in-Chief of the
Dr. Ni also serves as an associate editor and a reviewer for a number of International Journal of Robotics and Automation and an Associate Editor
international journals. for the IEEE T RANSACTIONS ON C YBERNETICS and several other journals.

You might also like