Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views22 pages

Vehicle Detection and Counting1

vehicle detection and counting yolo 8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views22 pages

Vehicle Detection and Counting1

vehicle detection and counting yolo 8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

lOMoARcPSD|48383005

Vehicle Detection AND Counting

computer engineering (Indian Institute of Technology Dharwad)

Scan to open on Studeersnel

Studocu is not sponsored or endorsed by any college or university


Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS

A
PROJECT PROPOSAL
ON
VEHICLE DETECTION AND COUNTING

SUBMITTED BY:
NAYAN PANDEY (PUL077BCT049)
NIRMAL RANA (PUL077BCT051)
PRASUN SITAULA (PUL077BCT057)

SUBMITTED TO:
DEPARTMENT OF ELECTRONICS & COMPUTER ENGINEERING

December, 2023

Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])


lOMoARcPSD|48383005

Acknowledgments
First and foremost, our special gratitude goes to IC Chair, Asst.Prof.Santosh Giri for pro-
viding guidelines for making this proposal.
We cannot express enough thanks to the project management team, Asst.prof. Bibha
Sthapit and Asst.Prof. Santosh Giri for their continued support and encouragement. We
are deeply indebted for the learning opportunities provided by the team.
Also, we’d like to show our appreciation to the Department of Electronics and Computer
Engineering (DOECE),IOE, Pulchowk Campus for their support to undertake the minor
project. Furthermore, the valuable feedback of the department had influential impact in
shaping our project.
We also thank everyone who has helped us directly or indirectly in our analysis, includ-
ing our friends, who provided us with their valuable comments and suggestions regarding
the proposal. The generosity and expertise of one and all have improved this proposal in
innumerable ways and saved us from many errors.

ii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

Abstract
Kathmandu, as well as urban areas around the globe, suffer from traffic management issues.
Our proposed system aims to address traffic management problems such as traffic jams,
congestion and traffic rule violations. To overcome these issues, we propose a YOLO-based
vehicle detection system that can detect and count the number of vehicles crossing a marked
intersection. The system will also classify the detected vehicles to determine the type of
vehicles commonly driven in a particular area.
To optimize the system for local conditions, the system will train and validate using primary
data that are collected from different areas within Kathmandu. The collected data will be
analyzed and stored to identify congestion and divert traffic to manage the flow of traffic
effectively.
Each iteration of the system will be evaluated and optimized to ensure that the final system
provides the best results. We anticipate that the final system will foster safer traffic by
reducing traffic congestion, especially during peak hours. The proposed system aims to
promote efficient traffic management and improve road safety.

iii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

Contents
Acknowledgements ii

Abstract iii

Contents v

List of Figures vi

List of Tables vii

List of Abbreviations viii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Review 3
2.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Related theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Proposed Methodology 6
3.1 Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Data Collection and Pre-processing . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Object Detection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 YOLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Vehicle Detection and Counting . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.6 Performance Evaluation and Optimization . . . . . . . . . . . . . . . . . . . 8

4 Proposed Experimental Setup 9

iv
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

5 Proposed System design 10


5.1 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Timeline 12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

v
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

List of Figures
2.1 Architecture of the YOLO network [1] . . . . . . . . . . . . . . . . . . . . . 4
2.2 Working of the YOLO model [1] . . . . . . . . . . . . . . . . . . . . . . . . . 5

5.1 Block Diagram of Proposed System . . . . . . . . . . . . . . . . . . . . . . . 10


5.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

vi
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

List of Tables

vii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

List of Abbreviations
YOLO You Only Look Once
FCN Fully Convolution Network
CNN Convolutional Neural Network
API Application Programming Interface

viii
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

1. Introduction
Vehicle Detection is a process that involves the identification and classification of vehicles
within an image or video frame. This process is applicable in various domains, including
traffic management, road safety, and autonomous vehicles. However, due to the project’s
constraints on time and resources, the primary focus will be on detecting vehicles in a junc-
tion for traffic management purposes. The proposed system will utilize computer vision
and deep learning methods to extract information from video data obtained from stationary
cameras installed at the junction of roads. The goal of this system is to accurately estimate
the number of vehicles present in the captured frames and provide valuable insights for road
planning, optimizing traffic networks, and reducing congestion. The successful implementa-
tion of this system is expected to enhance traffic surveillance and law enforcement, ultimately
improving road safety.

1.1 Background
Urban areas worldwide face traffic management problems due to the increasing number
of vehicles on roads. These problems include traffic jams, congestion, and compromised
safety. Kathmandu is no exception to this issue. Traffic problems occur frequently in the
city, especially during peak hours. The traditional method of monitoring traffic is labour-
intensive and error-prone, resulting in congestion and violations of traffic rules. To tackle
this issue, leveraging computer vision and deep learning can be promising solutions.

1.2 Problem statements


Despite the advancements in traffic management systems worldwide, Kathmandu and Nepal
continue to rely on traditional manual approaches to manage traffic in most parts. The
lack of real-time traffic monitoring and control systems has resulted in traffic congestion
and violations. This, in turn, has led to endless traffic jams and congestion, hampering the
flow of traffic. Furthermore, any research regarding traffic management in Nepal, involves
using tally marks to count the number of vehicles. This method is extremely cumbersome
and wastes a lot of time and other resources. Due to this constraint, the data regarding
the number of vehicles passing a certain section of the road is only taken once or twice a
year. The plans for upgrading the road infrastructures is based on the sample that does not
represent the population.

1
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

1.3 Objectives
The primary objectives of the proposed system are as follows:

1. To develop an accurate vehicle detection and counting system in the context of Kath-
mandu.

2. To collect traffic data which can be analyzed to manage the proper traffic flow.

3. To provide continuous surveillance of traffic to minimize traffic violations and enforce


traffic laws.

1.4 Scope
The system will focus on the utilization of computer vision technologies and deep learning
algorithms to develop a reliable and accurate vehicle detection system. The scope of such a
system encompasses:

1. Data Collection: The initial focus is on data collection from first-hand sources which
include data from images and videos collected from various intersections around the
city. Thus collected data will be labeled and with pre-processing fed to a YOLO-
based model. The final system should be able to count the number of vehicles passing
through a cross-section and then collect data which can be further used to be analyzed
to provide traffic management.

2. Model Development: The model will be based on the YOLO-based object detection
model which will be pre-trained on the pre-processed data from first-hand sources.

3. Vehicle Detection and Counting: The final system should be able to detect vehicles,
count the number of vehicles and save the related data to a storage.

4. Real-Time Processing: The system aims to detect and count vehicles in real or near-
real-time to aid efficient traffic management decision-making.

5. Accuracy and Reliability in Traffic Data: The real-time data collected should allow for
more reliable and accurate traffic data such to minimize congestion, especially during
peak hours.

2
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

2. Literature Review
One of the earliest methods of detecting objects was by using the method of background
subtraction. The background subtraction is achieved by taking absolute difference between
each incoming frame and a background model of the scene[2]. CNNs rose to prominence
in the mid 2010s as it outperformed previous object classification methods. For object
detection, sliding window and region proposal-based techniques were used on top of CNN.
The object detection task requires three separate algorithms: object localization, feature
extraction and object classification. While using CNN, all three algorithms require different
neural networks. The most popular object detection algorithm, YOLO, uses a single neural
network to perform all three activities. YOLO is quite fast and simple and it sees the entire
image during training and testing so it can encode contextual information as well.

2.1 Related work


Vision based vehicle detection has been used extensively throughout the world. Its appli-
cation lies in unmanned driving system as well as in designing Intelligent Transportation
Systems. Traditionally, vehicle counting for designing transportation infrastructures was
done manually. This led to data being collected only once or twice a year to estimate the
traffic flow of a certain area. To overcome this issue, hardware based solutions were proposed
which were costly and unreliable in certain cases. Vision based vehicle detection systems suc-
ceeded it and today it is being implemented throughout the world. In Europe, a vision based
vehicle detection system was trained using high definition data set from the perspective of
surveillance cameras[3]. In the study, YOLOv3 object detection algorithm was used and the
vehicle trajectory was obtained by tracking the ORB features of multiple objects. Vehicle
identification and classification has been executed in Nepal as well[4]. In the study, vehicle
data was collected using a single camera in the linking road between Kaski and Syangja.
The study focused on comparing vehicle detection models in adverse conditions. The study
showed that YOLOv5 outperformed all the contemporary algorithms for object detection.

2.2 Related theory


The basis of object detection is the CNN. CNN is a class of neural network that specializes
in processing data that has a grid like structure like an image. The CNN detects increasingly
complex patterns in the image with each layer. For example it might detect the horizontal
edges in the first layer and it might recognize corners in the next layer and subsequently it

3
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

recognizes the image. A CNN typically has three layers, namely a convolution layer, a pooling
layer, and a fully connected layer. The convolution layer performs a matrix multiplication
between a kernel, which changes its value as the network learns, and a certain portion of the
image. The pooling layers are applied after the convolution layer to reduce the size of the
feature maps which makes the computation easier. Feature maps are the two dimensional
representation of neurons. In the fully connected layer, each neuron from the previous layer
is connected to the present layer. It is responsible for classifying the image and this layer
lies at the end of the network. YOLO utilizes the convolution neural network and performs
the complete task of object detection and classification using a single network.
The goal of any object detection algorithm is to determine a bounding box which contains
the object that is to be detected. The YOLO algorithm divides the image into S*S grid and
each cell outputs a prediction with a corresponding bounding box[1]. The cell which contains
the center of the object is taken as a reference and (x, y, w, h) are calculated with the top
right of the cell considered to be the origin. Here the x and y represent the center of the
object while w and h represent the width and the height of the object. A confidence score is
calculated for each cell as: Pr (object)∗IOU [5]. If an object exists in that cell, the confidence
score should be the ratio of intersection over the union.

Figure 2.1: Architecture of the YOLO network [1]

The initial YOLO network consisted of 24 convolutional layers followed by two fully
connected layers. However, its fast version uses 9 layers instead of 24 and uses fewer filters
in those layers.
YOLO is not suitable for detecting small objects as the number of grids should be in-
creased to avoid the centers of two objects from being in the same cell.

4
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

Figure 2.2: Working of the YOLO model [1]


The image is divided into S×S grid and each cell predicts the bounding box and it’s
corresponding confidence

5
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

3. Proposed Methodology
The vehicle detection and counting system will be based on YOLO object detection model
which will be trained and implemented using PyTorch. Following is the feasibility study
along with the expected development procedure to be followed.

3.1 Feasibility Study


The feasibility of the system is based on two feasibility study procedures: Technical feasibility
and Economic feasibility.

3.1.1 Technical Feasibility


In assessing the technical feasibility of the development of a vehicle detection and counting
system, the following considerations were made:

1. Hardware Availability: The hardware requirement for such a system, we found to be


a bare minimum. The hardware requirement includes:

(a) Camera: To fulfill the requirement of a camera, any device that can capture
images and videos in digital format can be used, this includes digital cameras,
smartphones, etc. The proposed system can be trained and should be able to
detect and count vehicles in images and videos that have already been captured.
(b) Computing Device: For training, we expect to train our vehicle detection and
counting model on Google Colab and if possible, on an external GPU. The re-
quirement of a computing device can be fulfilled by any computing infrastructure
with internet connectivity able to run our system after the training.

2. Data Collection and Labeling: The dataset to be used for training, we expect to be
first-hand data. The data will be collected and labeled by the team and thus the model
will be trained. However. in case of requirement for a larger dataset, the dataset will
be composed of data from both primary and secondary sources.

3.1.2 Economic Feasibility


The development of the system, we expect to be within minimum economic constraints.
Also, the implementation of the final system should be at a low cost.

6
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

1. Development: Development of the system from an economic point of view is a concern


of hardware availability. This as discussed in technical feasibility section is achievable
with cloud computing in Google Colab, computing devices and cameras which should
incur a very low cost. Besides these, since data is collected by our own team, the data
collection should be within daily expenses.

2. Operational Cost: The operation of the system should incur costs mainly based on
the maintenance frequency of hardware components. For real-time processing, the
environment can be an issue, however for use on a small scale, such may not be the
case.

3.2 Data Collection and Pre-processing


The data collection will be done through first-hand sources which will include our team
collecting traffic images and videos from diverse settings within Kathmandu. These will most
probably be done using our smartphones. Also, to ensure the local context is maintained,
the collected data shall include traffic images and videos that mostly contain the vehicles
that are mostly driven in Kathmandu. These collected data will be cleaned to avoid any
inconsistencies and normalized to ensure consistent quality across the dataset. These data
will be annotated to label vehicles with bounding boxes which will mostly favor local vehicles
and traffic scenarios. The final dataset shall be split into training, validation and testing
sets to ensure a balanced performance across different scenarios. The validation and testing
set may also include data from secondary sources. In case the collected data is not sufficient
to provide the result as expected, secondary sources of data may also be used.

3.3 Object Detection Model


YOLO is a state-of-the-art object detection model which can also be used for vehicle detec-
tion.

3.3.1 YOLO
YOLO is an object detection system that stands for You Only Look Once. It uses a deep
convolutional neural network to detect objects. YOLO is a fully convolutional network
(FCN) that uses only convolutional layers, and it has 75 convolutional layers, with skip
connections and upsampling layers. It doesn’t use any form of pooling, and instead, uses a
convolutional layer with stride 2 to downsample the feature maps. This helps prevent the
loss of low-level features often attributed to pooling. Being an FCN, YOLO is invariant to
the size of the input image. However, in practice, we might want to stick to a constant input
size due to various issues that arise when implementing the algorithm. One of the main

7
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

issues is that if we want to process images in batches, we need to have all images of fixed
height and width. This is necessary to concatenate multiple images into a large batch. The
network downsamples the image by a factor called the stride of the network. For example,
if the stride of the network is 32, then an input image of size 416 x 416 will yield an output
of size 13 x 13. Generally, the stride of any layer in the network is equal to the factor by
which the output of the layer is smaller than the input image to the network.

3.4 Model Training


Our YOLO-based vehicle detection model will be trained using Pytorch on Google Colab.
The model will be trained using the training dataset which consists of data from first-
hand sources. As already discussed, the data in this dataset will be annotated to represent
different classes of vehicles. Adjustments will be made to the model or the training set based
on its performance. The fine-tuned model’s accuracy and robustness will be evaluated using
performance metrics as well as validation and testing sets. The training will be repeated
iteratively to refine the model’s accuracy and robustness.

3.5 Vehicle Detection and Counting


Based on the trained model, the vehicle detection will be made based on the video input
given to the system. The detected vehicles are classified into a class of similar vehicles. Each
of the detected vehicles is counted. The classified vehicle data and their countings will be
stored in a record.

3.6 Performance Evaluation and Optimization


The validation and testing sets will again be used to validate whether or not the model
accurately detects, classifies, counts and records the vehicle data. The performance of the
system will be evaluated based on the model’s prediction and ground truth. During these
evaluations, the confidence threshold value for the detection will be fine-tuned to obtain
a minimum error margin. The false positives/negatives, errors during low light and bad
weather, as well as other limitations, will be addressed by model adjustments.

8
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

4. Proposed Experimental Setup


Though there is an extensive presence of surveillance cameras throughout the valley, traffic
data is rarely publicly released due to security and privacy concerns. The required data set for
this project will be obtained by capturing some videos in different intersections throughout
the valley. Using those videos, we will extract the images of the vehicles. Services like Label
Studio can be used to efficiently classify the different vehicles into one of the nine categories:
bike, car, bus, truck, scooter, micro-bus, tempo, tractor and lorry.

1. Selection of different locations to get a diverse range of traffic scenarios at different


times of day and varying weather conditions.

2. Capture images and videos of the vehicles passing by the road using a mobile camera.

3. setup the model development and training environment with deep learning framework
PyTorch.

4. Preprocess the data to filter out duplicate, corrupted and irrelevant data in the dataset.

5. Split the data into training, validation and testing sets with 80% for training, 10% for
validation and 10% for testing.

6. Train the model using the collected dataset.

7. Optimize hyperparameters to achieve better model performance and assessment of


the model’s performance using the validation set with metrics mean average preci-
sion(mAP) or F1 score.

8. Iteration of training process for the refinement of the model and evaluation of the final
model on the test set for assessing generalization.

9
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

5. Proposed System design


5.1 Block Diagram

Figure 5.1: Block Diagram of Proposed System

The video input is first subjected to a process of extracting multiple frames, each of
which is then passed through a pre-trained object detection model. The model makes use of
convolutional neural networks (CNN) to extract relevant features from each frame. These
features are then subjected to analysis, to predict the bounding boxes, class probabilities, and
confidence scores for all vehicles present in the image. The detected vehicles are subsequently
classified into specific vehicle types and counted. This data is then stored and can be further
analyzed for traffic flow analysis.

5.2 Use Case Diagram


The Use-Case diagram shows the interaction between the external users and the system. The
user will interact with the system through a web interface. The user will capture the video
and upload it to the system. The user will then initiate the system for vehicle detection.

10
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

Figure 5.2: Use Case Diagram

Once, the system is initiated, the trained AI model will be accessed through an API. The
trained model will detect the vehicle in the video frame and classify them in nine categories
as well as count their numbers. The count of the vehicles will be stored in database for future
reference. The result will be displayed to the user with friendly interface. The administrator
will be responsible for managing the users and the overall system.

11
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

6. Timeline

The project is planned to be completed in 3 months. The basic overview of the tasks involved
during the project is given below.

1. Project Planning and Research: Detailed study of methodologies and technologies to


be used.

2. Data Collection: Collection of data to be used for training models.

3. Development: Development and training of models from the collected data. This will
take place concurrently with data collection.

4. Integration and Testing: The model will be integrated with a web application which
will then be tested with users.

5. Evaluation and Improvement: The feedback from the users will be used for evaluation
of the system. If possible improvement of the system will also be carried out.

6. Documentation: The system will be documented from the beginning of the develop-
ment phase.

7. Final Presentation and Submission: The system will be demonstrated to the depart-
ment.

12
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])
lOMoARcPSD|48383005

References
[1] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:
Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 779–788, 2016.

[2] R. Manikandan and R. Ramakrishnan. Video object extraction by using background


subtraction techniques for sports applications. CiiT International Journal of Digital
Image Processing, 2013.

[3] Liang H. Li H. et al. Song, H. Vision-based vehicle detection and counting system using
deep learning in highway scenes. Eur. Transp. Res. Rev., 11(51), 2019.

[4] Biplav Regmi, Ramesh Thapa, and Biplove Pokhrel. Comparative study of cctv based
vehicle identification and classification models during adverse conditions in pokhara. In
Proceedings of 9th IOE Graduate Conference, volume 9, 2021.

[5] Chen B.Y. Shyr W.J. Shih F.Y. Wu, J.D. Vehicle classification and counting system
using yolo object detection technology. Traitement du Signal, 38(4):1087–1093, 2021.

13
Downloaded by Ph??ng Hoàng Ph?m Nguy?n ([email protected])

You might also like