Detection Tracking and Classification of
Detection Tracking and Classification of
Nikhil Asogekar
Department of Aerospace,
Punjab Engineering College,
Chandigarh, India
Sudarshan Rathi
Department of Aerospace,
Indian Institute of Science,
Bengaluru, Karnataka, India
Abstract—The increase in the volume of UAVs has been justified in various tests in an uncontrolled outdoor
rapid in the past few years. The utilization of drones has environment (in the presence of clouds, birds, trees, rain,
increased considerably in the military and commercial etc.), proving to be equally effective in all the situations
setups, with UAVs of all sizes, shapes, and types being used yielding high-quality results.
for various applications, from recreational flying to
purpose-driven missions. This development has come with Keywords—Anti-UAV Defense Systems, Digital Image
challenges and has been identified as a potential source of Processing, Convolutional Neural Network, YOLO, POT.
operational disruptions leading to various security
complications, including threats to Critical Infrastructures I. INTRODUCTION
(CI). Thus, the need for developing fully autonomous anti- The development of Unmanned Aerial Vehicles (UAVs), also
UAV Defense Systems (AUDS) hasn't been more imminent known as drones, has been rapid in recent years. Both
than today. To attenuate and nullify the threat posed by commercials, as well as military applications, have increased
the UAVs, either deliberately or otherwise, this paper considerably. Various companies like Uber, Amazon, etc., are
presents the holistic design and operational prototype of pushing forward to use drones as service providers for
drone detection technology based on visual detection using packages and food in a commercial setup. At the same time,
Digital Image Processing (DIP) and Machine Learning the military application includes warfare, identifying
(ML) to detect, track and classify drones accurately. The vulnerable areas prone to risks, and their mitigation.
proposed system uses a background-subtracted frame Despite attracting wide attention in diverse civil and
difference technique for detecting moving objects commercial applications, UAVs pose several threats to
partnered with a Pan-Tilt tracking system powered by airspace safety that may endanger people and property. While
Raspberry Pi to track the moving object. The such threats can be highly diverse regarding the attackers'
identification of moving objects is made by a intentions and sophistication, ranging from pilot unskillfulness
Convolutional Neural Network (CNN) system called the to deliberate attacks with unmanned aerial vehicles, they all
YOLO v4-tiny ML algorithm. The novelty of the proposed can produce severe disruption and cause menace.
system lies in its accuracy, effectiveness with low-cost The first-ever drone attack happened in the Indian Air Force
sensing equipment, and better performance compared to base, Jammu, India, on 27th June 2021, with two drones
other alternatives. Along with ease of operations, dropped an IED packed with high explosives. The US did a lot
combining the system with other systems like RADAR of drone strikes in Pakistan, Yemen, and Somalia between the
could be a real game-changer in detection technology. The year 2010 to 2020. The reported data shows 14,040 minimum
experimental validation of the proposed technology was confirmed strikes with 8858 to 16901 total people killed.
11
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
About 910 to 2200 civilians and 283 to 454 children were Much research is going on to enhance the performance of
killed in these attacks. [1] RADAR, RF, Visual, Acoustic, etc., methods [2]. Mukesh et
To mitigate and neutralize the threat posed by the misuse of al. [3] have proposed a review of other moving object
UAVs against deliberate malicious and inadvertent activities, detection and tracking methods. The significant challenges in
this paper presents a complete design of a Long-Range, Fully detecting moving object are dynamic background, noise in the
Autonomous Drone Detection Platform. Considering the video, illumination changes, etc. Different classical models are
requirements of a system that will automatically detect, track, proposed, such as Gaussian mixture background modelling [4]
and classify UAVs in a critical environment, the research in [5] [6], which performs well in some cases. Algorithms such
this paper presents a complete system that consists of both as ViBe [7] are fast but prone to noise in the video. Different
hardware and software components. The originality and pixel-based [8], region-based [9] and texture-based [10]
elegance of the proposed approach lie in the convergence and methods are used to model dynamic backgrounds.
confluence of hardware and software components efficiently Zhang et al. [11] have proposed a new algorithm using canny
and effectively for the accurate localization of intruder objects. edge detection to detect camouflaged moving objects. They
have used constant zoom to track an object detected in a frame
The rest of the paper is organized as follows. Literature using a PTZ camera. Huang et al. [12] have proposed an ANN
Reviewis explained in section II. The proposed method is (artificial neural networks) model to detect moving objects in
explained in section III. Experimental results are presented in dynamic backgrounds. Cao et al. [13] proposed an algorithm
section IV. Concluding remarks are given in section V. for dynamic background and irregular object movements. Bor-
HorngSheu et al. [14] have proposed moving object detection
II. LITERATURE REVIEW using a frame difference algorithm and tracking with the
With precise computer-controlled moments, variable speed, conversion of the pixel into the required pan-tilt angle in 3-D
and maneuvering capabilities, it has become essential to space.
predict the path of RPAS (Remotely piloted aircraft system) State-of-the-art methods such as Region-based CNN, fast R-
and APAS. With its small size (actual and apparent in-camera) CNN [15], and Faster R-CNN [16] are used as object detector
and its resemblance and similarity to that of other aerial and classifier methods in two stages. YOLO [17] and SSD
objects like aeroplanes, birds create challenges in automatic [18] are one-stage object detectors and classifiers.
detection and accurate localization in real coordinates in
space. Various methods have been proposed and implemented
to solve this auto-detection and tracking problem, such as RF, III. PROPOSED ALGORITHM
GPS, Radar, Acoustic, Vision-based, etc. While these systems A. Proposed method
can identify moving objects, tracking and classification are
acute problems they face. Since it is crucial to recognize the
malicious and harmful drones, Vision-Based Systems are
proposed and explored in this literature review.
Since the advent of fighter and commercial flights, detection
has significantly improved safety and enemy intention. With
the power of flight coming into the remotes and powerful
UAVs becoming an ever-increasing part of the modern world,
especially in the past five years, actively detecting, classifying,
and tracking has become significant.
Various techniques have been employed in detecting Flying
Aerial Vehicles, like RADAR, Acoustic, Computer Vision, Fig. 1. Proposed System Architecture
and RF Based detection. But never has there been the need to
implement these techniques localized and in mass than now. Figure 1. shows the proposed drone defence method using two
With the boom in the number of drones being produced and camera systems. A wide-angle static camera is mounted on a
used, the need for robust, efficient, and cost-effective solutions static frame. A pan-tilt setup is created using two servo
for the Detection and Tracking of Drone is at its peak. motors. The pan servo motor is fixed on the frame, whereas
With the ever-improving sophistication in the camera the tilt servo motor is attached to the rotating head of the pan
hardware, the potential of the Vision-Based Detection system servo motor. One camera with zoom capability is attached to
is immense. Better Algorithms paired with high-quality the rotating shaft of the tilt servo motor. This camera is
hardware have made Vision-Based Detection of objects one of referred to as a dynamic camera throughout the paper. Both
the founding pillars of autonomous systems. From self-driving motors are controlled with a raspberry pi. All primary
vehicles using sense and avoiding face detection, VBD has algorithms process on the main computer showing the output
and will continue to become an inseparable part of results. The static camera act as a detection system, whereas
technologies.
12
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
the dynamic camera tracks the detected object. The flowchart and frame difference is used to detect all moving objects
of the work is shown in figure 2. present in the frame of the static camera.
The detection algorithm is used to detect the moving objects When it comes to complex backgrounds containing objects
using modified background subtraction and frame difference such as trees, grass, etc., different algorithms fail to detect
method from the frames of the static camera. It returns the actual moving objects, as shown in figure 3. Here red circle
position of all detected objects in terms of a tuple (x, y), denotes the moving objects detected by the corresponding
denoting the object's bounding box centre. This data is then algorithm.
transferred to the raspberry pi system using a LAN cable. This Here figure 3a shows the original frame of the video.
pixel information is converted into corresponding pan-tilt Considering the first video frame as a background frame, the
activation by the raspberry pi. Pan-tilt servo motors are background subtraction method is applied to detect the
controlled using this information by the raspberry pi, so the moving objects, as shown in figure 3b.Figure 3c shows the
camera attached to it looks toward the object. gaussian mixture model application to classify the pixels as
foreground or background. Figure 3d shows the combination
of simple background subtraction and gaussian mixture model
methods. It reduces the false alarm up to a certain level but not
enough.
B. Detection
Detection of fast-moving objects in a complex background is a
Fig. 4. Flow chart of proposed moving object detection
challenging task. A combination of background subtraction
method
13
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
All the frame operations are carried on grey frames to reduce bounding circle is denoted across the object detected for better
the computation time. Figure 5b shows the colour image of the visualization along with its path.
drone, which is converted into grey, as shown in figure 5c.
The RGB frame of the static camera is first converted into the C. Tracking
grey frame as per the following equation – Object Tracking is the process of estimating or predicting the
Image[gray] = 0.299 ∗ R + 0.587 ∗ G + 0.114 ∗ B positions of moving objects in a video sequence using tracing
The background frame is the static camera's first frame, as algorithms. The object tracking program based on RaspberryPi
shown in figure 5a. Subtracting each k'th frame from this takes in the initial set of object detection as a bounding box
background frame gives a background-subtracted k'th frame. and makes a unique ID for each set of boxes. The algorithm
Such two consecutive background-subtracted frames are then tracks the moving object as they move in subsequent
subtracted to get the actual moving object, as shown in figure frames in a video sequence. These tracking methods are
5d. This step reduces the false alarms generated by slight readily applied to real-time video streams from the camera
movements of grass, leaves of a tree, clods etc. since all these with the help of a USB or IP-based protocol. The video is then
generate movement almost at the exact location in all frames. fed into the algorithm to perform object tracking. Each frame
After this, thresholding is used to convert the image into a is fed into the tracking algorithm for subsequent detection and
binary image, as shown in figure 5e. tracking, and as a result, high-performance tracking of the
Frame difference will leave broken gaps in the moving object object of interest is obtained.
detection, as shown in figure 5e. Morphological operation Object tracking is an important part of the overall system and
closing, a dilation followed by an erosion, can be used to fill is essential in the localization of the trespassing drone. The
these gaps, as shown in figure 5f. tracking method developed and used utilizes OpenCV-
The following equation defines morphological closing – Tracking by Detection, which tracks the object with the help
A ⋅ B = (A ⊕ B) ⊖ B of detection. A bounding box is created around the object to
be detected, showing the user where the object is in the frame.
A self-fabricated model of a Pan-Tilt camera operated by
Raspberry Pi and a Personal Computer (PC) Laptop for testing
the algorithm is used for tracking the object of interest.
14
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
Datasets
Different drone datasets are available open-source. We have
modified these datasets into the following categories –
1. Drone close to the camera– Images from different datasets
available are combined so that all drones are very close to the
camera.
2. Drone far from the camera– Dataset from the drone vs
bird detection challenge [20] is used, including drone videos
with a high distance between camera and drone.
Pan_High − Pan_Low
Pan_angle = ∗ Pixel_x + Pan_low
frame_width
15
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
16
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
Quantitative Comparisons
The detection algorithm is quantitatively compared with
different parameters on different videos of the dataset [20].
The indexes for evaluating the performance of these
techniques are as follows.
TP
Recall =
TP + FN Fig. 14. Result-Two cameras detection and tracking
Binarization Binarization
Video name threshold = 50 threshold = 20
[20] Precision Recall Precision Recall
GOPR5842_005 0.796 0.865 0.597 0.983
GOPR5844_002 0.80 0.987 0.585 1
Fig. 15. Result-Two cameras classification
GOPR5847_003 0.753 0.976 0.821 0.997
Zooming towards the object is implemented to get the required
GOPR5848_004 0.760 0.924 0.364 1 pixels on target (60x40 pixels). This image from the dynamic
camera is then fed to the YOLO algorithm for classification.
Table -1 Quantitative comparisons with binarization threshold The fame number and pixels on target are denoted in figure
15. Fame number 408, we get 2484 PoT; this frame will be
The results show that to detect an object at a far distance, we sent to YOLO for classification. If the object is a drone, it is
must reduce the binarization threshold, reducing false denoted on the frame in red colour.
negatives and improve the recall. But this increases the false
positives making the precision worse in some cases. C. Single (PTZ) camera system
A single-camera detection, tracking and classification system
Method Precision Recall is made using a camera attached to the PT system. Initially,
the system works in detection mode, which uses the frame of
Method proposed by Chen et 0.275 1 the dynamic camera. Once the object is detected, it sends the
al. [19] object's location to raspberry pi so that it can drive the PT
Proposedmethod 0.796 0.865 servo motors to keep the object at the centre of the camera, as
shown in figure 16.
Table -2 Quantitative comparisons with method proposed by
Chen et al. [19]
17
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
V.CONCLUSION
This paper illustrates a holistic design and operational
prototype of drone detection technology based on image
processing to solve the problem of illicit rogue drones
trespassing in classified and protected locations. The proposed
Fig. 16. Result-Single camera detection and tracking system accurately detects any moving object in the complex
background and varied environments up to but not limited to a
Zooming is applied to get enough PoT, after which the frame distance of 300 m, tracking it in real-time with near-zero
of the dynamic camera is sent to the YOLO algorithm for latency and classifying it as a potential threat (i.e., drone) or
classification. If the object is a drone, it is denoted in the red otherwise. The seamless confluence and convergence of
circle, as shown in figure 17. smartly designed hardware combined with tailored software
(proprietary) have resulted in a high-performance, low-cost
alternative to other detection methods. This system aims
toward fully automated surveillance and protection solutions
for critical infrastructure, emphasizing precision, accuracy,
and reduced response time. The culmination of Background
subtraction and Frame Difference has improved the standards
considerably compared to other similar works, as outlined, and
validated in the experimental result section.
The proposed system is an excellent initiative for future work
to continue the investigation in the same direction to yield an
industry-level robust system. A fusion of this technology with
existing technology, viz. RADAR or RF Detection in a certain
way would result in a sound and robust system with a high
range- high accuracy capability that would be practically
impenetrable.
Fig. 17. Result-Single camera classification
IV. REFERENCE
D. Multiple objects detection, tracking and [1] Retrieved from The Bureau of Investigative Journalism:
classification https://www.thebureauinvestigates.com/ (accessed on
When a static camera detects multiple moving objects, unique January 2022)
IDs are assigned to each object individually. Unique IDs are [2] BilalT., and Shoufan A.(2019). Machine learning-based
assigned to each object detected in the first frame. Euclidean drone detection and classification: State-of-the-art in
distance between objects in two consecutive frames is research.IEEE access 7 (2019): 138669-138682.
calculated. The same ID is assigned to the closest pair of [3] Tiwari M., and Singhai R.(2017). A review of detection
objects, whereas new objects get the following ID. If an object and tracking of object from image and video
moves out of the frame, its ID is erased from the system. sequences.Int. J. Comput. Intell. Res 13.5 (2017): 745-
Figure 18 shows four objects detected by the static camera, 765.
two birds with ID 0 and 4 and two drones with ID 1 and 2. [4] Lee D., Hull J., and Erol B.(2003). A Bayesian
The dynamic camera will track only one ID for which the framework for Gaussian mixture background
PoTs are higher, assuming it is nearest to the system. Here modeling.Proceedings 2003 international conference on
drone with ID2 is being followed by the dynamic camera, and image processing (Cat. No. 03CH37429). Vol. 3. IEEE,
other objects are ignored until the classification of the ID2 2003.
object. [5] Reynolds, Douglas A., Thomas F. Quatieri, and Robert
B. Dunn(2000). "Speaker verification using adapted
18
International Journal of Engineering Applied Sciences and Technology, 2022
Vol. 7, Issue 3, ISSN No. 2455-2143, Pages 11-19
Published Online July 2022 in IJEAST (http://www.ijeast.com)
Gaussian mixture models." Digital signal detector. uropean conference on computer vision.
processing 10.1-3 (2000): 19-41. Springer, Cham, 2016.
[6] Zivkovic Z.(2004). Improved adaptive Gaussian [19] Chen S., Xu T., Li D.,Zhang J., and . Jiang1 S (2016).
mixture model for background subtraction.Proceedings Moving object detection using scanning camera on a
of the 17th International Conference on Pattern high-precision intelligent holder. Sensors 16.10 (2016):
Recognition, 2004. ICPR 2004.. Vol. 2. IEEE, 2004. 1758.
[7] Olivier B., and Droogenbroeck M.(2009). ViBe: a [20] Coluccia, A.; Fascista, A.; Schumann, A.; Sommer, L.;
powerful random technique to estimate the background Dimou, A.; Zarpalas, D.; Méndez, M.; de la Iglesia, D.;
in video sequences.2009 IEEE international conference González, I.; Mercier, J.-P.; Gagné, G.; Mitra, A.;
on acoustics, speech and signal processing. IEEE, 2009. Rajashekar, S. Drone vs. Bird Detection: Deep
[8] Michael H.(2002). A framework for high-level Learning Algorithms and Results from a Grand
feedback to adaptive, per-pixel, mixture-of-gaussian Challenge. Sensors 2021, 21, 2824.
background models. European Conference on https://doi.org/10.3390/s21082824
Computer Vision. Springer, Berlin, Heidelberg, 2002.
[9] How-Lung E., and Junxian W., (2004). "Novel region-
based modeling for human detection within highly
dynamic aquatic environment." Proceedings of the 2004
IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, 2004. CVPR 2004..
Vol. 2. IEEE, 2004.
[10] Jing Z.(2003). Segmenting foreground objects from a
dynamic textured background via a robust kalman filter.
Proceedings Ninth IEEE International Conference on
Computer Vision. IEEE, 2003.
[11] Xindi Z., and Kusrini K.(2021). Autonomous long-
range drone detection system for critical infrastructure
safety. Multimedia Tools and Applications 80.15
(2021): 23723-23743.
[12] Shih-Chia H., and Ben-Hsiang D.(2013). Radial basis
function based neural network for motion detection in
dynamic scenes.IEEE transactions on cybernetics 44.1
(2013): 114-125.
[13] Xiaochun C., Yang L., and Guo X.(2015). Total
variation regularized RPCA for irregularly moving
object detection under dynamic background. IEEE
transactions on cybernetics 46.4 (2015): 1014-1027.
[14] Sheu B., Chiu C., Lu W., Huang C., and Chen
W.(2019). Development of UAV tracing and coordinate
detection method using a dual-axis rotary platform for
an anti-UAV system. Applied Sciences 9.13 (2019):
2583.
[15] Ross G. (2015). Fast R-CNN.IEEE International
Conference on Computer Vision (ICCV), 2015, pp.
1440-1448, doi: 10.1109/ICCV.2015.169.
[16] Ren S., He K., Girshick R., and Sun J. (2015). Faster r-
cnn: Towards real-time object detection with region
proposal networks. Advances in neural information
processing systems 28 (2015).
[17] Redmon J., Divvala S., Girshick R., Farhadi A. (2016).
You only look once: Unified, real-time object detection.
Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016.
[18] Liu W.,Anguelov D., Erhan D., Szegedy C., Reed S.,
Fu C., and Berg A. (2016). Ssd: Single shot multibox
19