Field Road Segmentation Method Based On Two Channe
Field Road Segmentation Method Based On Two Channe
https://doi.org/10.1007/s11042-024-20071-8
Abstract
Field road segmentation is one of the key technologies to realize autonomous navigation
of intelligent agricultural machinery. For the complex and changeable road conditions of
agricultural field roads, the existing road segmentation models cannot meet the needs of
fast and accurate detection of drivable areas on field roads. In this paper, we propose a field
road segmentation model (TFFNet) based on two channel feature fusion. First, the atrous
space pyramid pooling module is used to extract and fuse multi-scale pooling features.
Secondly, the deep separable convolution module and the asymmetric atrous convolution
module (AACM) are used to extract deep features from two paths and combine them. It is
fused to obtain a feature map with rich global scene context information. Finally, the chan-
nel attention module is introduced to update the weights of the feature channels at each
stage, so that the extracted feature information can help improve the accuracy of the model
in field road segmentation. Experiments show that, compared with the existing PSPNet,
ENet and DeepLabV3 + models, the road segmentation model proposed in this paper has
better segmentation performance in our multi-climate field road dataset, and can achieve
better segmentation accuracy and speed. Balanced to meet the needs of intelligent agricul-
tural machinery for field road detection.
1 Introduction
Agriculture is the basic industry for the development of the national economy. With the
continuous advancement of urbanization, the rural young and middle-aged labor force is
transferring to cities and towns, resulting in agricultural production is facing the prob-
lem of labor shortage and productivity decline, thus restricting the quality and efficiency
of agricultural production to a certain extent [1]. Improving the level of automation and
intelligence in the agricultural production process and developing intelligent agricultural
* Long Teng
[email protected]
1
Guangxi University, Nanning 530007, China
Vol.:(0123456789)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Multimedia Tools and Applications
needs of intelligent agricultural machinery and equipment to quickly detect the drivable
area of the road.
This paper proposes a field road segmentation framework (TFFNet) based on chan-
nel attention mechanism combined with two channel feature fusion. Firstly, the atrous
space pyramid pooling module is used to extract and fuse multi-scale pooling features of
the input image; Secondly, multiple deep separable convolution modules and asymmetric
atrous convolution modules (AACM) are successively used to extract and fuse multi-scale
pooling features from the two paths. Its deep feature information is extracted, and the fea-
ture information extracted by the two paths is fused; Then the channel attention module
is introduced to update the weight of the feature map extracted at each stage according to
the importance of its feature channel, so as to filter out the features with the feature chan-
nel of important feature information; Finally, the extracted feature map with rich feature
information is decoded to obtain the segmentation result of the drivable area of the field
road image. In order to verify the performance of our TFFNet model, experiments are con-
ducted on a multi-climate field road dataset made. Experiments show that compared with
the existing PSPNet, ENet and DeepLabV3 + models, our proposed TFFNet model has
better road segmentation performance and can meet the needs of intelligent agricultural
machinery for field road detection.
In summary, our main contributions are as follows:
• We propose a field road segmentation model (TFFNet) based on two channel feature
fusion. The model uses two paths to extract feature information of different scales from
the input image and fuse them to obtain a feature map with rich feature information,
thereby effectively improving the performance of field road segmentation.
• We propose an asymmetric convolution module (AACM). This module can obtain
a larger receptive field without increasing the amount of model parameters, so as to
obtain richer multi-scale context information, thereby effectively improving the feature
representation performance of the model.
• A dataset of field road images with various climatic conditions is constructed. We
evaluate the performance of our method with existing PSPNet, ENet and Deep-
LabV3 + models on the produced multi-climate field road dataset. The results show that
our TFFNet has better segmentation performance and can meet the needs of intelligent
agricultural machinery for field road detection.
2 Dataset
In order to better evaluate the performance of our proposed field road segmentation model,
we produced a dataset containing field road images of various weather scenes, referred to
as the MFR dataset.
2.1 Image acquisition
In order to obtain field road images conveniently, we used GoPro HERO 9 motion cam-
era, Feiyu VIMBLE 2 stabilizer and Devastator crawler mobile platform to build an image
acquisition vehicle. When the image acquisition vehicle is collecting images, the image
acquisition vehicle drives on the field road at a constant speed of 2 m/s, and the motion
camera captures a road image every 3 s, and the resolution of each image is 1920 × 1280.
All images in the MFR dataset were acquired by image acquisition vehicles in two prov-
inces, Guangxi and Henan, from April to November 2021. In addition, in order to make the
dataset more realistically reflect the field road scenes in actual production activities, and
to more objectively evaluate the performance of our proposed method, we conducted road
images collection under three climatic conditions: rainy, cloudy, and sunny, respectively.
Finally, after filtering out the unclear images, we collected a total of 4286 images, of which
there are 1025, 1497 and 1764 images for rainy, cloudy and sunny weather, respectively.
2.2 Image preprocessing
In order to obtain a more accurate road segmentation dataset and effectively increase the
diversity of data features, we performed a series of preprocessing on the collected field
road images. First, in order to reduce the occupation of computer memory when the net-
work model performs feature extraction on images, we scale the resolutions of all collected
images to 512 × 512. Secondly, since the collected original images do not have semantic
labels, we first use the open-source pixel-level labeling tool Labelme to semantically label
the two object categories "road" and "background" in the scaled image and save the label
files as. jason format files, and then convert the.jason format files containing semantic label
information into.png format semantic label images through batch conversion file technol-
ogy, so as to obtain binary images of field roads and backgrounds. We show some exam-
ples of field road scene images and their semantic labels from the MFR dataset, as shown
in Fig. 1. Then, in order to improve the generalization ability of the network model and
effectively avoid over fitting during model training, we enhanced the data through geo-
metric transformation [28], so that the number of images in the MFR dataset increased
from 4286 to 38574, including 9225, 13473 and 15876 pictures in rainy, cloudy and sunny
weather respectively. Geometric transformation is mainly to flip, rotate and scale the origi-
nal image and semantic label image, so as to expand the number of images. Among them,
flip is to flip the image horizontally and vertically respectively; Rotation is to rotate the
image from 0 to 180 degrees at an interval of 30 degrees; Scaling is to scale the image with
scaling coefficients of 0.8, 0.9 and 1.1 respectively [29]. As shown in Fig. 2, we show some
field road images processed by geometric transformation. Finally, we divided the 38,574
images obtained by the expansion into training set, validation set and test set according to
the ratio of 7:2:1, among which, there are 27,002 images in training set, 7,715 images in
validation set and 3,857 images in test set.
3 Methodology
The field road segmentation model (TFFNet) based on two channel feature fusion proposed
in this paper adopts an asymmetric encoder-decoder structure. In this section, we will
introduce the encoding network structure, decoding network structure and overall network
structure of the TFFNet model in detail respectively.
3.1 Encoder structure
The coding network of our dffnet model is mainly composed of four parts: atrous space
pyramid pooling module (asppm), efficient channel attention mechanism module (ECAM),
deep separable convolution module (DSCM) and asymmetric atrous convolution module
(AACM), as shown in Fig. 3.
First, the number of channels of the input image is adjusted by a 1 × 1 convolution layer
and downsampled by a max pooling layer to reduce the dimension of the feature map and
retain effective feature information, and then use the atrous space pyramid pooling module
( ASPPM) to extract and fuse multiple scale feature information to obtain more effective
global scene context feature information.As shown in Fig. 4a, the atrous spatial pyramid
pooling module (ASPPM) consists of a 1 × 1 convolutional layer, three atrous convolu-
tional layers with dilation rates of 6, 12 and 18 respectively, and an adaptive global pool-
ing. Multiple atrous convolution layers with different dilation rates are used to extract dif-
ferent scale feature information.
Then, the efficient channel attention mechanism module (ECAM) is used to update
the weight of the feature channel of the extracted multi-scale feature map according to
its importance, so as to obtain more effective feature information. As shown in Fig. 4b,
the efficient channel attention mechanism module (ECAM) first compresses the input
feature map into a 1 × 1 × C vector through the global pooling layer, then using a one-
dimensional convolution layer with convolution kernel size K to obtain the importance
of each feature channel and generate a new weight vector, and then multiply the origi-
nal input feature vector with the generated weight vector to obtain more effective fea-
ture information. Among them, the size of convolution kernel K of one-dimensional
convolution layer is adaptively determined by the number of channels, and the calcula-
tion method is shown in formula (1).
| log C + b |
K = || 2 |
| (1)
| σ |odd
where, ∥odd represents an odd number; C represents the number of channels; According to
the literature [30], the values of σ and b are 2 and 1, respectively.
Finally, the feature map output by the efficient channel attention mechanism module
(ECAM) is down-sampled by a max pooling layer to further reduce the dimension of the fea-
ture map, then five cascaded deep separable convolution modules (DSCM) and five cascaded
asymmetric atrous convolution modules (AACM) are used to extract and fuse deep feature
information of different scales from two paths respectively to obtain a feature map with rich
feature information, thus completing the feature encoding of the original input image. As
shown in Fig. 4c, assuming that the number of channels of the input feature map is C, the deep
separable convolution module (DSCM) uses a 3 × 3 convolutional layer to perform feature
extraction on the feature map of each channel to obtain C feature maps, and then use a 1 × 1
convolutional layer to fuse the obtained C feature maps. As shown in Fig. 4d, the asymmetric
atrous convolution codule (AACM) consists of 1 × 5 convolutional layers, 5 × 1 convolutional
layers, 1 × 1 convolutional layers, and 3 × 3 atrous convolutions with dilation rate 2. The fea-
ture maps of input images were extracted through 1 × 5 convolution layer, 5 × 1 convolution
layer and 3 × 3 atrous convolution layer with dilation rate of 2 respectively, and then the three
obtained feature maps were fused by the way of formula (2) ~ (4). Due to the different size
of convolution kernel, the feature extraction operation of asymmetric convolution not only
reduces the information redundancy brought by ordinary convolution, but also introduces non-
linear activation function between asymmetric convolutions. Therefore, the ability of adding
the features of the convolution model to each other can be improved, and the ability of fitting
the asymmetric features of the convolution model can also be improved.
� 𝛾j Yj (j) ̂
𝛾j
F (j) = F (j) + F + ̂(j)
F (2)
𝜎j 𝜎j 𝜎
̂j
μj γj μj γj ̂
μĵ
γj
bj = − − + β j + βj + ̂
βj (3)
σj σj ̂
σj
∑C � (j)
̂ ∶,∶,j =
O∶,∶,j + O∶,∶,j + O M∶,∶,k ∗ F∶,∶, + bj (4)
k=1
where, j represents the j-th convolution kernel; μj and σj represent the batch normalized
channel mean and standard deviation; γj and βj represent the scaling factor and offset; F �(j)
(j) (j)
represents the fused convolution kernel; bj represents the offset; F(j), F and ̂ F represent
the convolution kernel after the fusion of 3 × 3 atrous convolution layer, 1 × 5 convolution
layer and 5 × 1 convolution layer respectively; O∶,∶,j, O∶,∶,j and O
̂ ∶,∶,j represent the outputs
of the 3 × 3 atrous convolution layer, 1 × 5 convolution layer and 5 × 1 convolution layer
respectively; M∶,∶,k represents the feature map of the k-th channel of the input feature map.
3.2 Decoder structure
The decoding network of our proposed TFFNet model is mainly composed of two upsam-
pling bottleneck modules (UBM) and four common bottleneck modules (CBM) inter-
spersed, as shown in Fig. 3. The feature map extracted by the encoder is upsampled twice
by the upsampling bottleneck module (UBM) to restore the feature map to the size of
the original image, and four common bottleneck modules (CBM) are used to decode the
feature map to get the result of field road segmentation. The common bottleneck module
(CBM) consists of two 1 × 1 convolutional layers and one 3 × 3 convolutional layer, inter-
spersed with batch normalization layers (BN) and PReLU layers, as shown in Fig. 5a. The
structure of the upsampling bottleneck module (UBM) is similar to that of the common
bottleneck module (CBM), which is mainly composed of three 1 × 1 convolutional layers
and one 2 × 2 deconvolutional layer, and the batch normalization layer (BN) and PReLU
layers are interspersed, as shown in Fig. 5b.
3.3 TFFNet structure
Our proposed TFFNet model adopts an asymmetric encoder-decoder structure, which con-
sists of a large encoding network and a small decoding network, as shown in Fig. 3.
Firstly, the TFFNet model down-samples the field road image to compress the feature
dimension, and extracts the feature information of multiple scales through the atrous spa-
tial pyramid pooling module (ASPPM) to obtain more effective global scene context fea-
ture information.
Secondly, the efficient channel attention mechanism module (ECAM) calibrates the
weight of each feature channel of the extracted multi-scale feature map to screen the fea-
ture layers that are beneficial to improve the performance of model segmentation.
Then, the filtered feature map is dowm-sampled for the second time to further reduce
the dimension of the feature map and reduce the amount of calculation, and then the deep
feature information of two different scales is further extracted from the two paths through
five cascaded deep separable convolution modules (DSCM) and asymmetric atruos convo-
lution modules (AACM), and then the two scale feature information is fused to obtain the
feature map with rich feature information.
Finally, the extracted feature map is decoded through two up-sampling bottleneck mod-
ules (UBM) and four common bottleneck modules (CBM), so as to segment the driveable
area of field roads.
In this section, to demonstrate the effectiveness of the proposed TFFNet model, we con-
duct road segmentation experiments on the constructed MFR dataset. We first introduce
the experimental details and evaluation metrics. Then, the experimental results of the
TFFNet model and the existing PSPNet, ENet and DeepLabV3 + models in the MFR
dataset are compared and analyzed.
4.1 Experimental details
To verify that the proposed TFFNet model has good segmentation performance for driv-
able regions of field roads, we conduct experiments on the MFR dataset. Our experi-
mental platform is a desktop computer with Intel Core i5-9400 CPU, NVIDIA Tesla
K80 GPU, 8 GB running memory, PyTorch 1.8.1, CUDA 11.1 and cuDNN V11.1.74.
In order to ensure the reliability of the experiment, the performance indicators of
all network models compared with our TFFNet model in the experiment are obtained
by training and testing under the same experimental parameter settings. When we train
the TFFNet model on the MFR dataset, we use the stochastic gradient descent (SGD)
method for end-to-end training, the batchsize is set to 16, the momentum factor is set to
0.9, the weight decay is set to 0.0001, and the initial learning rate is set to is 0.0001, the
learning rate scheduling policy is set to poly policy, epochs is set to 300 and the number
of iterations is 506400. In order to make the training of the model enter a stable learning
state as soon as possible, the learning rate is warmed up at the beginning of the training,
and the learning rate is linearly increased from 0 to 0.025 in the first 1000 batch train-
ing, and then the learning rate decreases by exponential transformed with the increase
of the number of iterations. In addition, in the model training process, the model is
saved once every 2 epochs of training are completed to avoid losses caused by power
failure and abnormal exit during long-term training.
4.2 Evaluation metrics
In order to more intuitively evaluate the performance of the TFFNet model for field
road segmentation, we use four indicators: pixel accuracy (PA), mean pixel accuracy
(mPA), mean intersection ratio (mIoU), model parameters and road segmentation time.
It was evaluated for performance. Among them, the pixel accuracy (PA) represents the
ratio of the number of correctly predicted pixels to the total number of image pixels; the
average pixel accuracy (mPA) represents the ratio of the number of correctly predicted
pixels in each category to the total number of pixels in this category, and then the aver-
age value of all categories is calculated; The mean intersection-over-union ratio (mIoU)
represents the ratio of the intersection and union of the number of predicted pixels for
each class and the actual number of pixels, and then takes the average of all classes.
Assuming that the dataset has K categories, P ij represents the number of pixels whose
ii represents
true value is the i-th category but is predicted to be the j-th category, and P
the number of correctly predicted pixels, then the calculation of the three indicators of
the pixel accuracy (PA), average pixel accuracy (mPA) and the average intersection over
union ratio (mIoU) is shown in formulas (5) ~ (7):
∑K−1
pii
PA = ∑K−1i=0∑K−1 (5)
i=0
p
j=0 ij
1 �K−1 Pii
mPA = ∑K−1 (6)
K i=0
P
j=0 ij
1 �K−1 Pii
mIoU = ∑ K−1 ∑K−1 (7)
K i=0
j=0 Pij + j=0 Pji − pii
We test the existing PSPNet, ENet and DeepLabV3 + models and the proposed TFFNet
model on the MFR dataset respectively. The test results of each performance index are
shown in Table 2. According to the data in Table 2, for pixel accuracy (PA), our proposed
TFFNet model achieves 94.3%, which is 2.2%, 8.1% and 1.6% higher than PSPNet, ENet
and DeepLabV3 + models respectively; For average pixel accuracy (mPA), our proposed
TFFNet model achieves 93.7%, which is 4.6%, 8.9% and 3.4% higher than the PSPNet,
ENet and DeepLabV3 + models respectively; For the mean intersection over union (mIoU),
the TFFNet model proposed by us reaches 88.4%, which is 6.2%, 11.5% and 2.6% higher
than the PSPNet, ENet and DeepLabV3 + models respectively. This shows that our pro-
posed TFFNet model can extract and fuse the feature information of different scales
through two channel features to obtain the feature information with rich context informa-
tion. At the same time, the efficient channel attention mechanism module (ECAM) is used
to screen the features that are effective to improve the segmentation performance, so that
the TFFNet model has good adaptability in field road scenes, and has better performance
when segmenting the drivable area of field roads.
In order to more intuitively show the field road segmentation performance of the pro-
posed TFFNet model, we randomly select three images from the MFR dataset, and use
DeepLabV3 + , PSPNet, ENet and the proposed TFFNet model to segment them and
Fig. 7 Schematic diagram of field road segmentation results for each model
visualize the segmentation results, as shown in Fig. 7. It can be seen from Fig. 7c that
our proposed TFFNet model has high accuracy and precision for the segmentation of the
drivable area of the field road under the three climatic conditions of sunny, cloudy and
rainy. The TFFNet model can effectively avoid the influence of factors such as stagnant
water, light and weeds during road segmentation by fusing multi-scale contextual feature
information and using channel attention mechanism to screen effective feature informa-
tion. From Fig. 7d-e, it can be seen that the DeepLabV3 + and PSPNet models are prone
to misdetection of “pedestrians” and “rice fields” as “roads” when segmenting the driv-
able area of the field roads and unrecognized small targets. In addition, they are easy to be
disturbed by environmental factors such as light, standing water and weeds. It can be seen
from Fig. 7f that the ENet model also misidentifies "mountains", "rice fields" and "pedes-
trians" as "roads" and the detection is incomplete. This is mainly because the ENet model
does not achieve a balance between speed and accuracy, resulting in poor adaptability to
environmental factors such as standing water and weeds, which makes it less effective for
segmentation of drivable areas on field roads. This shows that the TFFNet model proposed
by us can still accurately identify the drivable areas of field roads in sunny, cloudy and
rainy weather, and has good anti-interference.
To test the speed of our proposed TFFNet model for detection of drivable areas on
field roads, we randomly select 10 road images from the test set of the MFR dataset,
then using DeepLabV3 + , PSPNet, ENet and the proposed TFFNet model to segment
the drivable area in the selected 10 images and recording the time required for each
model to complete an image segmentation, and then the average of the recorded 10 seg-
mentation times is used to represent the time required for the model to detect the drive-
able area of a single field road image, as shown in Table 3. From the data in Table 3, it
can be seen that the detection time of DeepLabV3 + , PSPNet and ENet models for a
single road image is 205 ms, 175 ms and 126 ms respectively, while the time used by
our proposed TFFNet model to detect a single road image is only 121 ms, which is 41%,
30.8% and 3.9% higher than that of DeepLabV3 + , PSPNet and ENet models respec-
tively. This shows that, compared with the three existing models of DeepLabV3 + ,
PSPNet and ENet, our proposed TFFNet model can detect drivable areas in field road
images faster, achieve a balance between segmentation accuracy and speed, and meet
the needs of real-time detection of the driveable area of the road when the intelligent
agricultural machinery is driving on the field road.
5 Conclusions
This paper proposes a field road segmentation model (TFFNet) based on two chan-
nel feature fusion. Firstly, the atrous space pyramid pooling module is used to extract
and fuse multi-scale pooling features of the input image; Secondly, multiple deep
separable convolution modules and asymmetric atrous convolution modules (AACM)
are successively used to extract and fuse deep feature information from the two paths
respectively; Then a channel attention module is introduced to filter out features with
important feature information. Finally, the extracted feature map with rich feature infor-
mation is decoded to obtain the segmentation result of the drivable area of the field
road image. Experiments show that compared with the existing PSPNet, ENet and Dee-
pLabV3 + models, the performance of our proposed TFFNet model is improved in all
aspects. In terms of parameter quantity, the parameter quantity of TFFNet model is only
0.21 M; In terms of pixel accuracy (PA) and average intersection ratio (mIoU), TFFNet
model reaches 94.3% and 88.4% respectively; The detection time of a single road image
is only 121 ms, which is 41%, 30.8% and 3.9% higher than the DeepLabV3 + , PSP-
Net and ENet models, respectively. In summary, the road segmentation model (TFFNet)
proposed in this paper has good segmentation performance in field road images under
various weather conditions, and can meet the needs of intelligent agricultural machinery
for real-time detection of field roads.
Funding The authors did not receive support from any organization for the submitted work.
Data availability Data sharing not applicable to this article as no datasets were generated or analysed during
the current study.
Declarations
Ethics approval Not applicable.
Competing interests The authors declare that they have no known competing financial interests or personal
relationships that could have appeared to influence the work reported in this paper.
References
1. Qingkuan Meng, Xiaoxia Yang, Man Zhang et al (2021) Recognition of unstructured field road scene
based on semantic segmentation model[J]. Trans Chinese Soc Agri Eng (Transactions of the CSAE)
37(22):152–160.in Chinese with English abstract. https://doi.org/10.11975/j.issn.1002-6819.2021.22.
017 http://www.tcsae.org
2. Chengliang L, Hongzhen L, Yanming Li et al (2020) Analysis on status and development trend of
intelligent control technology for agricultural equipment[J]. Trans Chinese Agric Machinery 51(1):1–
18 (in Chinese with English abstract)
3. Chattha HS, Zaman QU, Chang YK et al (2014) Variable rate spreader for real-time spot-application of
granular fertilizer in wild blueberry[J]. Comput Electron Agric 100:70–78
4. Onishi Y, Yoshida T, Kurita H, et al. An automated fruit harvesting robot by using deep learning[C]//
Tokyo: The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec), 2018:
6–13.
5. Jianguo C, Yanming Li, Chengjin Q et al (2018) Design and test of capacitive detection system for
wheat seeding quantity[J]. Trans Chinese Soc Agri Eng (Transactions of the CSAE) 34(18):51–58 (in
Chinese with English abstract)
6. Man Z, Yuhan Ji, Shichao Li, Cao Ruyue Xu, Hongzhen ZZ (2020) Research1 Progress of Agricultural
M lachinery Navigation Technology [J]. Trans Chinese Soc Agri Machinery 51(04):1–18
7. Scharwachter T, Franke U (2015) Low-level fusion of color, texture and depth for robust road scene
understanding[C]//. IEEE In Intelligent Vehicles Symposium (IV) 2015:599–604
8. Das S, Mirnalinee TT, Varghese K (2011) Use of salient Features for the design of a multistage
framework to extract roads from high-resolution multispectral satellite images[J]. IEEE Trans Geosci
Remote Sens 49(10):3906–3931
9. Siran T (2019) Road segmentation of high-spatial resolution remote sensing images by considering
gradient and color information [J]. Sci Technol Eng 19(31):263–269
10. Cheng G, Zhu F, Xiang S, Pan C (2016) Road centerline extraction via semi-supervised segmentation
and multi direction nonmaximum suppression. IEEE Geosci Remote Sens Lett 13(4):545–549
11. Thenmozhi K , Reddy U S . Crop pest classification based on deep convolutional neural network and
transfer learning - ScienceDirect[J]. Computers and Electronics in Agriculture, 164:104906–104906.
12. Liu S, Huang S, Xu X, et al. Efficient visual tracking based on fuzzy inference for intelligent transpor-
tation systems[J]. IEEE Transactions on Intelligent Transportation Systems, 2023.
13. Duong LT, Nguyen PT, Sipio CD et al (2020) Automated fruit recognition using EfficientNet and
MixNet[J]. Comput Electron Agric 171:105326
14. Jiang H, Zhang C, Qiao Y et al (2020) CNN feature based graph convolutional network for weed and
crop recognition in smart farming[J]. Comput Electron Agric 174:105450
15. Liu X, Hou S, Liu S et al (2023) Attention-based multimodal glioma segmentation with multi-attention
layers for small-intensity dissimilarity[J]. J King Saud Univ Comput Inform Sci 35(4):183–195
16. Gómez O, Mesejo P, Ibáez O et al. (2020) Deep architectures for high-resolution multi-organ chest
X-ray image segmentation. Neural Comput Appl 32(2)
17. Zhang M , Li X , Xu M , et al. (2020) Automated Semantic Segmentation of Red Blood Cells for
Sickle Cell Disease. IEEE J Biomed Health Inform (99):1–1
18. Liu S, Wang S, Liu X et al (2021) Human memory update strategy: a multi-layer template update
mechanism for remote visual monitoring[J]. IEEE Trans Multimedia 23:2188–2198
19. Long J, Shelhamer E, Darrell T (2015) Fully convolutional cetworks for semantic segmentation[J].
IEEE Ransactions Patt Anal Mach Intell 39(4):640–651
20. Wang J, Kim J (2017) Semantic segmentation of urban scenes with a location prior map using lidar
measurements[C]// IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Vancouver, BC, Canada 661–666
21. Zhang Z, Liu Q, Wang Y (2017) Road extraction by deep residual U-Net[J]. IEEE Geosci Remote Sens
Lett 32(99):1–5
22. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition[C]// Conference on
Computer Vision and Pattern Recognition, IEEE 5410–5418
23. Zhe C , Chen Z (2017) RBNet: A Deep Neural Network for Unified Road and Road Boundary
Detection[C]// International Conference on Neural Information Processing
24. Chen LC, Papandreou G, Kokkinos I et al (2016) DeepLab: Semantic image segmentation with deep
convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Trans Pattern Anal Mach
Intell 40(4):834–848
25. Chen L C, Papandreou G, Schroff F, et al. (2017) Rethinking atrous convolution for semantic image
segmentation[C]// Computer Vision and Pattern Recognition, IEEE 3061–3070.
26. Chen L C, Zhu Y, Papandreou G, et al. (2018) Encoder-decoder with atrous separable convolution for
semantic image segmentation[C]//Computer Vision and Pattern Recognition, IEEE 4040–4048
27. Zhang Z, Xu C, Yang J et al (2018) Deep hierarchical guidance and regularization learning for end-to-
end depth estimation[J]. Pattern Recogn 83:430–442
28. Li Y, Wang H, Dang LM et al (2020) Crop pest recognition in natural scenes using convolutional neu-
ral networks[J]. Comput Electron Agri 169:105174
29. Wang J, Li Y, Feng H et al (2020) Common pests image recognition based on deep convolutional neu-
ral network[J]. Comput Electron Agric 179(1):105834
30. Wang Q , Wu B , Zhu P , et al. (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional
Neural Networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR). IEEE
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
1. use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
2. use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at