Semantic segmentation of road
surface
BT20CME103 SURAJ YADAV
BT20MEC032 DARLA JOHNSON
BT20MEC053 KASULA ANIRUDH
BT20MME021 CHAUDHARY DHIRAJ
GHANSHYAM
BT19ECE045 JAYANT RAHATE
Literature
1. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[1].
2. Fully convolutional networks for semantic segmentation[2].
3. A New Performance Measure and Evaluation Benchmark for Road Detection
Algorithms[3].
Problem Description
● The problem in semantic road segmentation
is that, there is an input image in which we
are required to classify different objects
such as roads, vehicles, pavements,
potholes etc in the image.
● In the first image, a normal human eye can classify the different objects without any
difficulty.
● But a computer cannot classify the objects directly from the input image, so it has to
segment the image into different classes to understand the input image. The result is
the second image also known as segmented image.
Dataset
abbreviation train test description
UU 98 100 Urban unmarked
UM 95 96 Urban marked two way road
UMM 96 94 Urban marked multi lane road
URBAN 289 290 All three urban subsets
Dataset consists of 600 frames (375 × 1242 px) extracted from the KITTI
dataset at a minimum spatial distance of 20m.
The recordings stem from five different days and contain relatively low
traffic density, i.e., the road is often completely visible.
Methodology
Semantic Segmentation follows three steps:
1. Classifying: Classifying a certain object in the image.
2. Localising : Finding the object and drawing a bounding box around it.
3. Segmentation: Grouping the pixels in a localised image by creating a segmentation
mask.
● Essentially, the task of Semantic Segmentation can be referred to as classifying a
certain class of image and separating it from the rest of the image
classes by overlaying it with a segmentation mask.
● Semantic Segmentation often requires the extraction of features and representations,
which can derive meaningful correlation of the input image, essentially
removing the noise.
● The convolutional neural network (CNN) performs this task and is frequently used in most computer vision
tasks.
● In semantic segmentation, our aim is to extract features before using them to separate the image into multiple
segments.
Methodology
● To efficiently separate the image into multiple segments, we need to upsample it using an interpolation
technique, which is achieved using deconvolutional layers.
● In general AI terminology, the convolutional network that is used to extract features is called an encoder.
The encoder also downsamples the image, while the convolutional network that is used for upsampling is
called a decoder.
References
1. T. Rateke, K. A. Justen, and A. von Wangenheim, “Road Surface Classification with
Images Captured From Low-cost Camera - Road Traversing Knowledge (RTK)
Dataset”, RITA, vol. 26, no. 3, pp. 50–64, Nov. 2019,DOI:
https://doi.org/10.22456/2175-2745.91522
2. T. Pham, "Semantic Road Segmentation using Deep Learning," 2020 Applying New
Technology in Green Buildings (ATiGB), 2021, pp. 45-48, DOI:
10.1109/ATiGB50996.2021.9423307.