Vehicle Detection and Counting1
Vehicle Detection and Counting1
TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS
A
PROJECT PROPOSAL
ON
VEHICLE DETECTION AND COUNTING
SUBMITTED BY:
NAYAN PANDEY (PUL077BCT049)
NIRMAL RANA (PUL077BCT051)
PRASUN SITAULA (PUL077BCT057)
SUBMITTED TO:
DEPARTMENT OF ELECTRONICS & COMPUTER ENGINEERING
December, 2023
Acknowledgments
First and foremost, our special gratitude goes to IC Chair, Asst.Prof.Santosh Giri for pro-
viding guidelines for making this proposal.
We cannot express enough thanks to the project management team, Asst.prof. Bibha
Sthapit and Asst.Prof. Santosh Giri for their continued support and encouragement. We
are deeply indebted for the learning opportunities provided by the team.
Also, we’d like to show our appreciation to the Department of Electronics and Computer
Engineering (DOECE),IOE, Pulchowk Campus for their support to undertake the minor
project. Furthermore, the valuable feedback of the department had influential impact in
shaping our project.
We also thank everyone who has helped us directly or indirectly in our analysis, includ-
ing our friends, who provided us with their valuable comments and suggestions regarding
the proposal. The generosity and expertise of one and all have improved this proposal in
innumerable ways and saved us from many errors.
ii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
Abstract
Kathmandu, as well as urban areas around the globe, suffer from traffic management issues.
Our proposed system aims to address traffic management problems such as traffic jams,
congestion and traffic rule violations. To overcome these issues, we propose a YOLO-based
vehicle detection system that can detect and count the number of vehicles crossing a marked
intersection. The system will also classify the detected vehicles to determine the type of
vehicles commonly driven in a particular area.
To optimize the system for local conditions, the system will train and validate using primary
data that are collected from different areas within Kathmandu. The collected data will be
analyzed and stored to identify congestion and divert traffic to manage the flow of traffic
effectively.
Each iteration of the system will be evaluated and optimized to ensure that the final system
provides the best results. We anticipate that the final system will foster safer traffic by
reducing traffic congestion, especially during peak hours. The proposed system aims to
promote efficient traffic management and improve road safety.
iii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
Contents
Acknowledgements ii
Abstract iii
Contents v
List of Figures vi
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review 3
2.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Related theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Proposed Methodology 6
3.1 Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Data Collection and Pre-processing . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Object Detection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 YOLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Vehicle Detection and Counting . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.6 Performance Evaluation and Optimization . . . . . . . . . . . . . . . . . . . 8
iv
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
6 Timeline 12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
v
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
List of Figures
2.1 Architecture of the YOLO network [1] . . . . . . . . . . . . . . . . . . . . . 4
2.2 Working of the YOLO model [1] . . . . . . . . . . . . . . . . . . . . . . . . . 5
vi
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
List of Tables
vii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
List of Abbreviations
YOLO You Only Look Once
FCN Fully Convolution Network
CNN Convolutional Neural Network
API Application Programming Interface
viii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
1. Introduction
Vehicle Detection is a process that involves the identification and classification of vehicles
within an image or video frame. This process is applicable in various domains, including
traffic management, road safety, and autonomous vehicles. However, due to the project’s
constraints on time and resources, the primary focus will be on detecting vehicles in a junc-
tion for traffic management purposes. The proposed system will utilize computer vision
and deep learning methods to extract information from video data obtained from stationary
cameras installed at the junction of roads. The goal of this system is to accurately estimate
the number of vehicles present in the captured frames and provide valuable insights for road
planning, optimizing traffic networks, and reducing congestion. The successful implementa-
tion of this system is expected to enhance traffic surveillance and law enforcement, ultimately
improving road safety.
1.1 Background
Urban areas worldwide face traffic management problems due to the increasing number
of vehicles on roads. These problems include traffic jams, congestion, and compromised
safety. Kathmandu is no exception to this issue. Traffic problems occur frequently in the
city, especially during peak hours. The traditional method of monitoring traffic is labour-
intensive and error-prone, resulting in congestion and violations of traffic rules. To tackle
this issue, leveraging computer vision and deep learning can be promising solutions.
1
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
1.3 Objectives
The primary objectives of the proposed system are as follows:
1. To develop an accurate vehicle detection and counting system in the context of Kath-
mandu.
2. To collect traffic data which can be analyzed to manage the proper traffic flow.
1.4 Scope
The system will focus on the utilization of computer vision technologies and deep learning
algorithms to develop a reliable and accurate vehicle detection system. The scope of such a
system encompasses:
1. Data Collection: The initial focus is on data collection from first-hand sources which
include data from images and videos collected from various intersections around the
city. Thus collected data will be labeled and with pre-processing fed to a YOLO-
based model. The final system should be able to count the number of vehicles passing
through a cross-section and then collect data which can be further used to be analyzed
to provide traffic management.
2. Model Development: The model will be based on the YOLO-based object detection
model which will be pre-trained on the pre-processed data from first-hand sources.
3. Vehicle Detection and Counting: The final system should be able to detect vehicles,
count the number of vehicles and save the related data to a storage.
4. Real-Time Processing: The system aims to detect and count vehicles in real or near-
real-time to aid efficient traffic management decision-making.
5. Accuracy and Reliability in Traffic Data: The real-time data collected should allow for
more reliable and accurate traffic data such to minimize congestion, especially during
peak hours.
2
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
2. Literature Review
One of the earliest methods of detecting objects was by using the method of background
subtraction. The background subtraction is achieved by taking absolute difference between
each incoming frame and a background model of the scene[2]. CNNs rose to prominence
in the mid 2010s as it outperformed previous object classification methods. For object
detection, sliding window and region proposal-based techniques were used on top of CNN.
The object detection task requires three separate algorithms: object localization, feature
extraction and object classification. While using CNN, all three algorithms require different
neural networks. The most popular object detection algorithm, YOLO, uses a single neural
network to perform all three activities. YOLO is quite fast and simple and it sees the entire
image during training and testing so it can encode contextual information as well.
3
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
recognizes the image. A CNN typically has three layers, namely a convolution layer, a pooling
layer, and a fully connected layer. The convolution layer performs a matrix multiplication
between a kernel, which changes its value as the network learns, and a certain portion of the
image. The pooling layers are applied after the convolution layer to reduce the size of the
feature maps which makes the computation easier. Feature maps are the two dimensional
representation of neurons. In the fully connected layer, each neuron from the previous layer
is connected to the present layer. It is responsible for classifying the image and this layer
lies at the end of the network. YOLO utilizes the convolution neural network and performs
the complete task of object detection and classification using a single network.
The goal of any object detection algorithm is to determine a bounding box which contains
the object that is to be detected. The YOLO algorithm divides the image into S*S grid and
each cell outputs a prediction with a corresponding bounding box[1]. The cell which contains
the center of the object is taken as a reference and (x, y, w, h) are calculated with the top
right of the cell considered to be the origin. Here the x and y represent the center of the
object while w and h represent the width and the height of the object. A confidence score is
calculated for each cell as: Pr (object)∗IOU [5]. If an object exists in that cell, the confidence
score should be the ratio of intersection over the union.
The initial YOLO network consisted of 24 convolutional layers followed by two fully
connected layers. However, its fast version uses 9 layers instead of 24 and uses fewer filters
in those layers.
YOLO is not suitable for detecting small objects as the number of grids should be in-
creased to avoid the centers of two objects from being in the same cell.
4
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
5
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
3. Proposed Methodology
The vehicle detection and counting system will be based on YOLO object detection model
which will be trained and implemented using PyTorch. Following is the feasibility study
along with the expected development procedure to be followed.
(a) Camera: To fulfill the requirement of a camera, any device that can capture
images and videos in digital format can be used, this includes digital cameras,
smartphones, etc. The proposed system can be trained and should be able to
detect and count vehicles in images and videos that have already been captured.
(b) Computing Device: For training, we expect to train our vehicle detection and
counting model on Google Colab and if possible, on an external GPU. The re-
quirement of a computing device can be fulfilled by any computing infrastructure
with internet connectivity able to run our system after the training.
2. Data Collection and Labeling: The dataset to be used for training, we expect to be
first-hand data. The data will be collected and labeled by the team and thus the model
will be trained. However. in case of requirement for a larger dataset, the dataset will
be composed of data from both primary and secondary sources.
6
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
2. Operational Cost: The operation of the system should incur costs mainly based on
the maintenance frequency of hardware components. For real-time processing, the
environment can be an issue, however for use on a small scale, such may not be the
case.
3.3.1 YOLO
YOLO is an object detection system that stands for You Only Look Once. It uses a deep
convolutional neural network to detect objects. YOLO is a fully convolutional network
(FCN) that uses only convolutional layers, and it has 75 convolutional layers, with skip
connections and upsampling layers. It doesn’t use any form of pooling, and instead, uses a
convolutional layer with stride 2 to downsample the feature maps. This helps prevent the
loss of low-level features often attributed to pooling. Being an FCN, YOLO is invariant to
the size of the input image. However, in practice, we might want to stick to a constant input
size due to various issues that arise when implementing the algorithm. One of the main
7
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
issues is that if we want to process images in batches, we need to have all images of fixed
height and width. This is necessary to concatenate multiple images into a large batch. The
network downsamples the image by a factor called the stride of the network. For example,
if the stride of the network is 32, then an input image of size 416 x 416 will yield an output
of size 13 x 13. Generally, the stride of any layer in the network is equal to the factor by
which the output of the layer is smaller than the input image to the network.
8
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
2. Capture images and videos of the vehicles passing by the road using a mobile camera.
3. setup the model development and training environment with deep learning framework
PyTorch.
4. Preprocess the data to filter out duplicate, corrupted and irrelevant data in the dataset.
5. Split the data into training, validation and testing sets with 80% for training, 10% for
validation and 10% for testing.
8. Iteration of training process for the refinement of the model and evaluation of the final
model on the test set for assessing generalization.
9
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
The video input is first subjected to a process of extracting multiple frames, each of
which is then passed through a pre-trained object detection model. The model makes use of
convolutional neural networks (CNN) to extract relevant features from each frame. These
features are then subjected to analysis, to predict the bounding boxes, class probabilities, and
confidence scores for all vehicles present in the image. The detected vehicles are subsequently
classified into specific vehicle types and counted. This data is then stored and can be further
analyzed for traffic flow analysis.
10
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
Once, the system is initiated, the trained AI model will be accessed through an API. The
trained model will detect the vehicle in the video frame and classify them in nine categories
as well as count their numbers. The count of the vehicles will be stored in database for future
reference. The result will be displayed to the user with friendly interface. The administrator
will be responsible for managing the users and the overall system.
11
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
6. Timeline
The project is planned to be completed in 3 months. The basic overview of the tasks involved
during the project is given below.
3. Development: Development and training of models from the collected data. This will
take place concurrently with data collection.
4. Integration and Testing: The model will be integrated with a web application which
will then be tested with users.
5. Evaluation and Improvement: The feedback from the users will be used for evaluation
of the system. If possible improvement of the system will also be carried out.
6. Documentation: The system will be documented from the beginning of the develop-
ment phase.
7. Final Presentation and Submission: The system will be demonstrated to the depart-
ment.
12
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005
References
[1] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:
Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 779–788, 2016.
[3] Liang H. Li H. et al. Song, H. Vision-based vehicle detection and counting system using
deep learning in highway scenes. Eur. Transp. Res. Rev., 11(51), 2019.
[4] Biplav Regmi, Ramesh Thapa, and Biplove Pokhrel. Comparative study of cctv based
vehicle identification and classification models during adverse conditions in pokhara. In
Proceedings of 9th IOE Graduate Conference, volume 9, 2021.
[5] Chen B.Y. Shyr W.J. Shih F.Y. Wu, J.D. Vehicle classification and counting system
using yolo object detection technology. Traitement du Signal, 38(4):1087–1093, 2021.
13
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])