International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930
Object detection using open CV
Pooja kumbhar. Prof-R.N.Patil
Onkar patil Department of electronics engineering,DKTE Institute
Rushikesh Takale ichalkaranji.,Maharashtra,India.
Rohit kolhapure
Department of electronics engineering,DKTE Institute
ichalkaranji.,Maharashtra,India.
METHODLOGY:
Abstract-The aim of this thesis is to explore different methods
The whole concept of Detection and recognition of
for helping computers interpret the real world visually,
Objects in our Project lays in the Python Library
investigate solutions to those methods offered by the open-
OpenCV and Coco Dataset which provides us total 91
sourced computer vision library, OpenCV, and implement
Objects which can be used in a way in our code to
some of these in a Raspberry Pi based application for
Recognize the Objects in Video/Image Preference.
detecting and keeping track of objects. The main focus rests
on the practical side of the project. The result of this thesis is a Taking into account the relatively highperformance
GNU/Linux based C/C++ application that is able to detect and requirements of image processing in general and the
keep track of objects by reading the pixel values of frames equipment currently available to the faculty, as a
captured by the Raspberry Pi camera module. The application relatively inexpensive and powerful embedded
also transmits some useful information, such as coordinates platform the Raspberry Pi was an obvious choice.
and size, to other computers on the network that send an Training and testing the custom model will be done on
appropriate query. The source code of the program is images of people in winter landscape. The idea is that
documented and can be developed further. the mainstream pretrained models out there is trained
on people in a variety of landscapes. It would be
interesting to see the possibility of fine-tuning the
model using simple tools to increase accuracy
Introduction
. Object detection and recognition are important problems
in computer vision. Since these problems are meta-heuristic,
despite a lot of research, practically usable, intelligent, real- hardware design:
time, and dynamic object detection/recognition methods are
still unavailable. We propose a new object Hardware design includes the selection / design of
detection/recognition method, which improves over the following hardware modules:
existing methods in every stage of the object 1. Raspberry pi 4
detection/recognition process. In addition to the usual features, 2. 8MP Standard Rpi CAM
we propose to use geometric shapes, like linear cues, ellipses 3. 32 GB Memory Card
and quadrangles, as additional features. Two of the famous 4. 3.1Amp Adaptor
robots are the C3PO and R2D2 machines in Star Wars made in
1977. Nowadays, the advances of hardware have created many
good robots. One good example is the ASIMO from Honda.
Problem statement-
• To implement Object detection and Recognition with Help
of Raspberry Pi 4 with 8MP Camera by OpenCV.
Project Objective :
• To build a Object Detection Model with Raspberry pi.
Project Scope:
1. To research and understand the methods of Detecting
Objects in OpenCV .
2. To learn how OpenCV Works with HaarCascade Files
. 3. To implement and test the model as a functional
prototype.
© 2021, IJSREM | www.ijsrem.com Page 1
International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930
OVERCLOCKING: CIFAR-10 and CIFAR-100 offered 10 and 100 categories from a
The easiest way to overclock the Raspberry Pi model B is to dataset of tiny 32×32 images .While these datasets contained
do it via Raspberry’s configuration interface, which appears up to 60,000 images and hundreds of categories, they still
on every start-up, or can be opened using the command sudo only captured a small fraction of our visual world. Recently,
raspi-config. Overclocking is recommended since image ImageNet made a
processing operations consume fairly large amounts of CPU
• Object detection: Detecting an object entails both
power (if not optimized to harness the GPU instead) and if the
stating that an object belonging to a specified class is
Raspberry’s airflow is above minimal (heat disperses easily),
it will not damage the SoC. The only problems that may arise present, and localizing it in the image. The location of
are unstability when remotely accessing the Pi on which a an object is typically represented by a bounding box,
graphical user interface server is running. In that case, the Early algorithms focused on face detection using
Raspberry quickly shuts down all access via network and can various ad hoc datasets. Later, more realistic and
even freeze completely. challenging face detection datasets were created .
Another popular challenge is the detection of
pedestrians for which several datasets have been
OPEN CV INTRODUCTION: created. The Caltech Pedestrian Dataset contains
350,000 labelled instances with bounding boxes.
• The application written for this thesis relies heavily on • For the detection of basic object categories, a multi
computer vision, image processing and pixel manipulation, year effort from 2005 to 2012 was devoted to the
for which there exists an open source library named OpenCV creation and maintenance of a series of benchmark
(Open Source Computer Vision Library), consisting of more datasets that were widely adopted. The PASCAL VOC
than 2500 optimized algorithms. Uses range from facial datasets contained 20 object categories spread over
recognition, object identifying, classifications of human 11,000 images. Over 27,000 object instance bounding
actions in videos, achieved with filters, edge mapping, image boxes were labelled, of which almost 7,000 had
transformations, detailed feature analysis and more Having detailed segmentations. Recently, a detection
Linux support, this is the perfect choice for developing an challenge has been created from 200 object categories
application specifically for a Raspberry Pi based system. using a subset of 400,000 images from ImageNet. An
Another positive aspect of this library is that it’s written impressive 350,000 objects have been labelled using
bounding boxes.
natively in C++ and therefore can be very smoothly
implemented in a C/C++ application. Algorithmic Analysis:
• While there are numerous methods and algorithms • Bounding-box detection For the following experiments
contained within OpenCV, the most important benefit of this we take a subset of 55,000 images from our dataset1
library for the purposes of this thesis are its basic data and obtain tight-fitting bounding boxes from the
structures like Mat, which can be used to store pixel values of annotated segmentation masks. We evaluate models
an image in an n-dimensional array, Scalar and Point, which tested on both MS COCO and PASCAL, see Table 1. We
respectively contain pixel values and coordinates of up to 3 evaluate two different models. DPMv5-P: the latest
dimensions. implementation 1.
The functions provided by this library are also necessary in • These preliminary experiments were performed before
the development process of the object tracking application. our final split of the dataset intro train, val, and test.
There are numerous options, but following the scope of this Baselines on the actual test set will be added once the
thesis, the focus is set on grabbing frames from a live camera evaluation server is complete. of [44] (release 5 [45])
feed [8], image thresholding using HSV colour space ranges, trained on PASCAL VOC 2012. DPMv5-C: the same
finding blobs and using their detected contours in a binary implementation trained on COCO (5000 positive and
image and, in case a graphical user interface is enabled, 10000 negative images). We use the default
displaying of image frames and a control panel for changing parameter settings for training COCO models.
parameters during run-time.
Mobile Net Classifier:
OBJECT DETECTION:
Now it’s time for our Mobile Nets to label each
• Image Classification The task of object classification
proposed image, forwarded by the region proposal
requires binary labels indicating whether objects are present
system. We pre-process the current frame and
in an image; Early datasets of this type comprised images
generate multiple images by zero-outing all but one
containing a single object with blank backgrounds, such as
objects each time using the input box coordinates. As
the MNIST handwritten digits or COIL household objects .
a result, we get the same number of images as the
Caltech 101 and Caltech 256 marked the transition to more
detected boxes for each input frame, we then feed all
realistic object images retrieved from the internet while also
the images to the classifier to generate labels. Finally,
increasing the number of object categories to 101 and 256,
We take the classification output and combine them
respectively. Popular datasets in the machine learning
with the bounding boxes, overlaying on the original
community due to the larger number of training examples,
frame. Readers may be curious about why we do not
© 2021, IJSREM | www.ijsrem.com Page 2
International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930
use the cropped bounding box directly as input to the Face attributes:
classifier. The reason is that the classifier is trained for
a fixed size images, while in practice small objects • Another use-case for MobileNet is compressing large
appear very often. If we up sample the objects, the systems with unknown or esoteric training procedures. In
resulting images might have too low resolution to be a face attribute classification task, we demonstrate a
recognized correctly. Another reason is that we synergistic relationship between MobileNet and
assume the background information is helpful for distillation [9], a knowledge transfer technique for deep
prediction, hence we keep it. networks. We seek to reduce a large face attribute
classifier with 75 million parameters and 1600 million
Mult-Adds.
Dataset pre-processing: • MobileNet can also be deployed as an effective base
network in modern object detection systems. We report
Our image data is supplied by PASCAL VOC2012
results for MobileNet trained for object detection on
detection dataset [5]. The dataset consists of 20 classes,
COCO data based on the recent work that won the 2016
including objects most commonly captured in traffic
COCO challenge [10]. In table 13, MobileNet is compared
cameras like bus, car, bicycle, motorcycle and person. The
to VGG and Inception V2 [13] under both Faster-RCNN
data has been split into 50% for training/validation and
[23] and SSD [21] framework. In our experiments, SSD is
50% for testing, with the distributions of images and
evaluated with 300 input resolution (SSD 300) and Faster-
objects by class are approximately equal across the
RCNN is compared with both 300 and 600 input resolution
training/validation and test sets. This comes to a training
(FasterRCNN 300, Faster-RCNN 600). The Faster-RCNN
set of 5717 images and a validation set of 5823 images
model evaluates 300 RPN proposal boxes per image.
Mobile net: Results
• We also implement a detector version of Mobile Net,
namely Mobile-Det, by combining Mobile Net classifier
and Single Shot MultiBox Detector (SSD) framework [14].
The reason we want to do this is to further analyse the
benefit of Mobile Net model, and have a fair comparison
between the state-of-art detection model like VGG-based
SSD and YOLO. The details of SSD is massive and beyond
the scope of this project, so we will only have a brief
introduction to how it works in the following parts. In
short, SSD framework uses multiple feature layers as
classifiers, where each feature map is evaluated by a set
of different (aspect ratio) default boxes at each location in
a convolutional manner, and each classifier predicts class
scores and shape offset relative to the boxes.
• At training time, a default box is considered to be
predicting correctly if its Jaccard overlap with the ground
truth box is larger than the threshold (0.5). The loss is
then measured by both the confident score and
localization score.
© 2021, IJSREM | www.ijsrem.com Page 3
International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930
Conclusion: models on different datasets with lower resolutions and see if
it could improve accuracy for smaller objects. Another thing
The aim of this thesis was to investigate the suitability of that was left out in this project was looking at the impact that
running a real time object detection system on a Raspberry Pi. lightning has on a models ability to detect objects, this is
Two models, SSD and YOLO, were implemented and tested something that could be explored in future work
in accuracy and speed at different input sizes. The results
showed that both models are very slow and that only in References:
applications that doesn't require high speed would it be viable
Y. Amit and P. Felzenszwalb, "Object Detection", Computer
to use the Raspberry Pi as hardware. There is a trade-off with
accuracy to be made if higher speeds are to be achieved since Vision , pp. 537-542, 2014. "What is a Raspberry Pi?",
Raspberry Pi , 2019. [Online]. Available:
there is not enough computational power to have both. This
lead to the conclusion that it is important to choose a proper https://www.raspberrypi.org/help/what-%20is-a-raspberry-pi/.
input size to obtain the right balance of speed and accuracy [Accessed: 06-Mar- 201] D. Velasco-Montero, J.
that is needed for a particular application. This study could be Fernández-Berni, R. Carmona-Galán and Á.Rodríguez-
of help for others that are looking to implement object Vázquez, "Performance analysis of real-time DNN inference
detection on similar hardware to find that balance. on Raspberry Pi", Real-Time Image and Video Processing
2018 , 2018. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S.
Future Scope: Reed, C. Fu and A. Berg, "SSD: Singlehot MultiBox
Detector", Computer Vision – ECCV 2016 , pp.21-37, 2016.
Due to time constraints there was a lot of things left out that
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D.
could have been done to strengthen the results of this study Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO:
such as more testing of different objects, distances and input Common Objects in Context,” Computer Vision – ECCV
sizes. One thing that would be interesting is to train own 2014 Lecture Notes in Computer Science , pp. 740–755, 201
© 2021, IJSREM | www.ijsrem.com Page 4