Image Processing
Image Processing
Slot: G1+TG1
submitted by
UTKARSH VERMA(19BCE0078)
ALKA RANI(19BCI0004)
ADVIKA SRIVASTAVA(19BCE2217)
On
Keywords:
DeepLearning; Computer Vision; OpenCV; Haar cascade
Introduction
India witnessed a surge in COVID-19 infections from mid July 2020, the total number of new
cases rose upto 97,900 per day by late September, then declining to lesser number of new
cases per day by November. This was termed to be the first wave of COVID-19. However,
with the oncoming of new year and other festivals, the restrictions loosened and people
became careless and the face masks were not worn anymore even in large crowds. This
carelessness was punished by the pandemic as it hit again this time, raising the new infection
count to as high as 4,00,000 per day as recorded on 6th-8th May of 2021. Now, as the
infection rates subside, the people have become careless yet again. However, taking lessons
from the second wave, public places like malls, cinema etc. have enforced wearing of masks.
However, deploying a large workforce for this purpose is infeasible. This is where our project
comes in. Our project helps ensure everyone is wearing masks with the need of little to no
human intervention.
LITERATURE SURVEY
SR. NO. TITLE METHODOLOGY RESULTS DRAWBACK
1. Face Mask detection This paper used different The experimental The dataset used is
using transfer learning deep Convolutional Neural results show that not large enough to
2020 IEEE 2nd Networks (CNN) to extract despite a small deal with complex
International deep features from images dataset size (1376 problems.
Conference on of faces. The extracted faces), the use of
Electronics, Control, features are further transfer learning is a
Optimization and processed using various good approach to
Computer Science machine learning classifiers classifying faces
(ICECOCS) such as Support Vector with masks and
Machine (SVM) and faces without masks.
K-Nearest Neighbors Seeing that the
(K-NN). Were used and combination of
examined all different MobileNet-V2
metrics such as accuracy model with SVM
and precision, to compare got an excellent
all model performances. performance
(97.11%).
2. Social distance A robust model to initiate Image processing The proposed model
Monotoring and Face the basic mandatory step to based social is not implemented
Mask detection using be taken by the society to distancing in large coverage
Deep neural network control the COVID transit. framework with area which is
MSc Internet of Things with Using Deep Neural Network deep learning neural required to improve
Data Analytics Bournemouth to identify social distancing network for analysis the accuracy of the
University, United Kingdom
(2020).
being maintained at the is evaluated. The prediction model
public sites and making sure proposed idea with respect to
people are wearing a mask suggests a global improving the
while entering a premise. display using training model.
AWS-IoT platform.
The research would
be further extended
by implementing the
robust prediction
system and real time
monitoring of face
mask detection
system with large
data size.
3. An Improved Neural The author proposed a When testing, Nil
Network Cascade for technique for masked face different sizes of
Face Detection in Large detection using four detection windows
Scene Surveillance. different steps of are normalised to fit
Received: 30 September estimating distance from the input size of each
2018; Accepted: 8 camera, eye line detection, algorithm. Method
November 2018; facial part detection and (red) outperforms the
Published: 11 November eye detection. ,They two compared
2018 improve popular cascade approaches by a large
algorithms by proposing a margin. The number
novel multi-resolution of true positive and
framework that utilises false positive
parallel convolutional detections when we
neural network cascades tune over different
for detecting faces in large threshold values.
scene. Comparing with When the threshold
popular cascade value is smaller, more
algorithms, this method faces are accepted, as
outperforms them by a well as more false
large margin. alarm. Yet, overall,
our approach
performs relatively
well over different
thresholds.
The project
4. Face Mask Detection implementation is divided Nil
Using CNN for into two phases: Training Detected the people
Covid-19.Vol. 10, Issue and Deployment. In not wearing face
6, June 2021 training phase, the GitHub masks and thus
dataset named ‘Face Mask decrease the risk of
Detection Dataset’ is getting infected by
loaded. This dataset is used COVID-19.
to train the Face Mask
Classifier using CNN
algorithm. VGG-16
(Visual Geometric Group)
architecture will be used to
train the CNN classifier.
Store the trained Face
Mask Classifier on the
disk. In the deployment
phase, first the trained
Face Mask Classifier is
loaded.
5 Facial Mask Detection This paper proposes twin Results are Nil
using Semantic objective of creating a demonstrated on
Segmentation," 2019 4th Binary face classifier Multi Human Parsing
International Conference which can detect faces in Dataset with mean
on Computing, any orientation irrespective pixel level accuracy.
Communications and of alignment and train it in Also the problem of
Security (ICCCS), 2019. an appropriate neural erroneous predictions
network to get accurate has been solved and a
results. The model requires proper bounding box
inputting an RGB image of has been drawn
any arbitrary size to the around the segmented
model. The model’s basic region. Proposed
function is feature network can detect
extraction and class non frontal faces and
prediction multiple faces from
single image. The
method can find
applications in
advanced tasks such
as facial part
detection.
6. An Efficient Ant Colony In this paper, they use the The proposed NIL
System for Edge particular variant Ant algorithms were
Detection in Image Colony System (ACS) compared with the
Processing Dorigo and Gambardella ACS system
An Efficient Ant Colony presented in Baterina
(1997) to solve the
System for Edge and Oppus (2010)
Detection in Image problem of edge detection. without filtering. The
Processing. ACS applied two rules for two algorithms
the pheromone update: proposed in this paper
local and global, in order have outperformed
to achieve a better search the traditional ACS
of the problem space. They (without filtering)
and higher quality
propose two modified
solutions were
algorithms of ACS, which obtained in
are developed to obtain an significantly shorter
efficient edge detection time, without the
with a minimized need for addition
complexity. noise-filtering
processes.
9. Chowdary, P. R. V., Gesture recognition is the They have proposed It Does not
Babu, M. N., fast growing field in image four simple implement by using
Subbareddy, T. V., processing and artificial MATLAB algorithms Xilinx System
Reddy, B. M., & technology. The gesture for gesture Generator software
Elamaran, V. (2014). recognition is a process in recognition which are which is linked with
Image processing which the gestures or invariant to rotation. Xilinx FPGAs by
algorithms for gesture postures of human body Out of all the implementing
recognition using parts are identified and are algorithms scanning hardware
MATLAB. used to control computers method is the robust co-simulation.
and other electronic method which
appliances. The most delivers accurate
contributing reason for the results for 82.47% of
emerging gesture images. The results
recognition is that they can obtained from the
create a simple above algorithms can
communication path be used to control any
between human and electronic appliance.
computer called HCI. Some of them can be
MATLAB code can be controlled the VLC
converted to HDL or media player or
VHDL code and can be power point
embedded in FPGA for presentation without
hardware execution. This having any physical
future work can be contact with the
implemented by using computer thus
Xilinx System Generator establishing a better
software which is linked human computer
with Xilinx FPGAs by interaction
implementing hardware
co-simulation.
10. Stefan van der Walt, Scikit-image is an image scikit-image provides NIL
Johannes L. processing library that easy access to a
Schönberger, Juan implements algorithms and powerful array of
Nunez-Iglesias, François utilities for use in research, image processing
Boulogne and Neil education and industry functionality. Over
Yager. Scikit-image: applications. It provides the past few years, it
Image processing in easy access to a powerful has seen significant
Python array of image processing growth in both
functionality. Over the past adoption and
few years, it has seen contribution,19 and
significant growth in both the team is excited to
adoption and contribution. collaborate with
Due to the breadth and others to see it grow
maturity of its code base, even further, and to
as well as its commercial establish it the de
friendly licence, facto library for
scikit-image is well suited image processing in
for industrial application Python
11. NIL
H. Adusumalli, D. The methodology Due to increasing
Kalyani, R. K. Sri, M. consisted of 4 stages. First COVID cases a
Pratapteja and P. V. R. stage is dataset collection: system to replace
D. P. Rao, "Face Mask the dataset was collected humans to check
Detection Using from Kaggle and divided masks on the faces of
OpenCV," 2021 Third into training and testing people is greatly
International Conference data after analysis. Second needed. The system
on Intelligent stage is training the model proposed in the paper
Communication to detect face masks: can be installed in
Technologies and Virtual OpenCV module is used to public places like
Mobile Networks obtain faces followed by airports, malls, etc. to
(ICICV), 2021, pp. training a Keras model to ensure safety of
1304-1309, doi: identify face masks. Third workers and public. It
10.1109/ICICV50876.20 stage is detecting the can also be used in
21.9388375. person not wearing a mask MNCs where there
and finally sending them are hundreds of
an email or notification for workers, as this
the same. system can store
employee data as well
and can notify them if
they are not wearing
a mask.
RCNN is essentially a
Although the Faster
12. H. Jiang and E. region-based convolutional MultiresHPM is a
R-CNN is designed
Learned-Miller, "Face neural network used for better detector than
for generic object
Detection with the object recognition. The Faster R-CNN.
detection, it
Faster R-CNN," 2017 pipeline has a total of two
demonstrates
12th IEEE International stages. First, a set of
impressive face
Conference on category-independent
detection
Automatic Face & object suggestions is
performance when
Gesture Recognition generated by a selective
retrained on a suitable
(FG 2017), 2017, pp. search. In the second
face detection
650-657, doi: refinement step, the image
training set. It may be
10.1109/FG.2017.82. area in each proposal is
possible to further
warped to a fixed size and
boost its performance
then mapped to a
by considering the
4096-dimensional feature
special patterns of
vector. This feature vector human faces.
is sent to the classifier and
the regressor to narrow
down the detection
position. R-CNN brings
high accuracy of CNN’s on
classification problems of
object detection. R-CNN
requires a forward pass
through the convolution
neural network for each
object to extract its
features. This requires
large computations,
creating a computational
burden. To solve this
problem Fast R-CNN was
introduced, that runs
through the network
exactly once for an entire
input image.
3. Flow diagram of
methodology:
14.
G. Deore, R. Bodhula, Input: Estimated distance Distance from camera Algorithm
V. Udpikar and V. More, from camera, eye line method is used to see
detects very
"Study of masked face detection, facial part if person is
small part of
detection approach in detection, eye detection. approaching towards
image in low
video analytics," 2016 the camera or going
The author proposes a resolution.
Conference on Advances away to decide
in Signal Processing technique of masked face whether to trigger
(CASP), 2016, pp. detection using the four face detection or not.
196-200, doi: inputs mentioned above. Eye line detection
10.1109/CASP.2016.774 Distance from camera can finds out the valley in
6164. be calculated using pinhole histogram projection.
camera model. Eye and If eye line is detected
eyebrows correspond to correctly, face
low grey levels as detection can be
compared to other parts. applied to check for a
These locations correspond mask. (Eye line and
to the local valley of the eye detection have
horizontal projection maximum false rates
histogram. For face because the algorithm
detection, Viola Jones’s detects a very small
algorithm is used. Eye part of the image in
detection based on Viola low resolution).
Jones’s algorithm, which is
the last step is done to
ensure if a person has a
mask or not. If eyes are
detected and face is also
detected, implies no mask.
If only eyes are detected,
implies the person is
wearing a mask.
15. NA
Mandal, B., Okeukwu, Input: Pre-trained ResNet-50 Precision, recall and
A. and Theis, Y., 2021. architecture trained on human F1-score are used to
Masked face recognition faces. observe, how well the
using resnet-50. arXiv model works.
The problem of identifying
preprint the identity of an individual
Precision was 0.8993,
arXiv:2104.08997. wearing a mask, because recall score was
features required to predict 0.8970 and F1 score
are reduced to eyes and was 0.897.
forehead. The paper proposes
a framework to solve the
problem of detecting masked
individuals using ResNet-50
architecture. Transfer
learning is applied to adapt to
a pre trained ResNet-50
model, to images of people
without masks. The model is
the subjected to
hyperparameter tuning to
identify identity of
individuals with a mask
(images of same individuals
without masks used before).
The accuracy obtained in the
experiment is 89%. Dataset
used for experiment is Real
World Masked Face
Recognition
Dataset(RMFRD). The
datasets contain three
different data, real-world
masked face recognition
dataset, simulated masked
face recognition datasets, and
real-world masked face
verification. Additionally, the
dataset has 5000 masked and
90,000 unmasked faces.
Modules:
1. Image Enhancement
a. Grayscale Conversion
b. Histogram Equalization
c. Blurring
Methodology:
➔ The image is converted into grayscale, because grayscale stores just luminance
(brightness) information rather than colour information, maximum luminance is white and
zero luminance is black; anything in between is a shade of grey. As a result, grayscale
photographs have only grayscale hues and no colour. This makes it easier to process the
image as there is only one colour channel.
➔ After conversion to grayscale, the image undergoes histogram equalization. Histogram
equalisation, also known as histogram flattening, is one of the most essential nonlinear point
procedures. Its premise is similar to that of FSHS: a picture should not only cover the
available grayscale range, but also distribute equally across it. This could be seen in the
results we obtained.
➔ After the equalization is done, the image is subjected to blurring. Blurring an image
reduces the deviation of intensity values. That means, it reduces the sudden changes in the
intensity. The reasoning behind using blurring is it reduces the amount of unwanted edges
(for e.g. the folds of the eye, face or clothing).
➔ At last, the image goes through the algorithm of edge detection which is the canny edge
cascade. Canny Edge Detection is a popular edge detection algorithm. It was developed by
John F. Canny. It is a multi-stage algorithm including, Noise reduction, Intensity Gradient
detection, Non-maximum suppression and Hysteresis Thresholding. ➔ Finally, the important
features are extracted from the image and fed into the haar classifier and the result is
displayed on the original image.
Architecture Diagram :
Code:
import numpy as np
import cv2
import random
face_cascade = cv2.CascadeClassifier('data\\xml\\haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('data\\xml\\haarcascade_eye.xml')
mouth_cascade = cv2.CascadeClassifier('data\\xml\\haarcascade_mcs_mouth.xml')
upper_body = cv2.CascadeClassifier('data\\xml\\haarcascade_upperbody.xml')
# User message
font = cv2.FONT_HERSHEY_SIMPLEX
org = (30, 30)
weared_mask_font_color = (255, 255, 255)
not_weared_mask_font_color = (0, 0, 255)
thickness = 1
font_scale = 1
weared_mask = "Thank You for wearing mask"
not_weared_mask = "Please wear mask to prevent Corona"
# Read video
cap = cv2.VideoCapture(0) #access webcam
while 1:
# Get individual frame
ret, img = cap.read()
# img = cv2.flip(img,1)
# Face detected but Lips not detected which means person is wearing mask
if(len(mouth_rects) == 0):
cv2.putText(img, weared_mask, org, font, font_scale, weared_mask_font_color,
thickness, cv2.LINE_AA)
else:
for (mx, my, mw, mh) in mouth_rects:
# Release video
cap.release()
cv2.destroyAllWindows()
Output Screenshots:
False Positive:
Experimental Results:
True Positive:
False Negative:
True Negative:
False Positive:
False Positive:
Evaluation Metrics:
Accuracy=tp+fp/tp+fp+tn+fn= 3/6
Recall= tp /tp+fn=1/2
Conclusion:
Hence with the help of OpenCV and Haar Cascade Classifier, we were able to implement
mask detection.
The outputs were accurate upto a certain extent. In low lighting, the model showed certain
inaccuracy. There were also false positives as well, as shown in the output section.
This proposed architecture can be implemented in malls and other densely populated places
and can be connected to an alarm that can notify owners or other administrators that there are
users without masks and can hence ensure safety and well-being of other people in the
surrounding area.
Future Work:
Future enhancement focuses on reducing the false positive i.e. predicting that a person is
wearing a mask even though he is not, which can be done by training a custom , as well as to
reduce the false negative i.e. wrong prediction regarding person not wearing a mask, Thus
improving the overall accuracy.