Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views6 pages

2023 Scopus Emotion and Gesture Recognition

This document presents a methodology for emotion and gesture recognition using deep learning algorithms, specifically focusing on Automatic Facial Expression Recognition Systems (AFERS). The proposed approach includes face detection, feature extraction, and recognition of facial expressions and gestures, utilizing MediaPipe architecture and various CNN models. The study highlights the effectiveness of the system in real-time applications and discusses the challenges and future work in enhancing gesture recognition capabilities.

Uploaded by

yuva raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

2023 Scopus Emotion and Gesture Recognition

This document presents a methodology for emotion and gesture recognition using deep learning algorithms, specifically focusing on Automatic Facial Expression Recognition Systems (AFERS). The proposed approach includes face detection, feature extraction, and recognition of facial expressions and gestures, utilizing MediaPipe architecture and various CNN models. The study highlights the effectiveness of the system in real-time applications and discusses the challenges and future work in enhancing gesture recognition capabilities.

Uploaded by

yuva raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

An Efficient way of Emotion and Gesture

Recognition using Deep Learning Algorithms


2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT) | 978-1-6654-9360-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICECCT56650.2023.10179652

Periyanayaki M M Duraisamy
Department of Computer Science and Engineering Department of Computer Science Government Arts and
KPR Institute of Engineering and Technology Science CollegeTirupattur, India
Coimbatore, India [email protected]
[email protected]
Yuvaraj Natarajan
Premkumar Duraisamy Department of Computer Science and Engineering,
Department of Computer Science and Engineering KPR Institute of Engineering and Technology
KPR Institute of Engineering and Technology Coimbatore, India
Coimbatore, India [email protected]
[email protected]

Abstract—Researchers in psychology, computer science, lin- be used to perceive human emotions. Understanding a person’s
guistics, neurology, and allied fields have become more interested emotional state from facial expressions is very useful [3]. This
in a human-computer interface system for autonomous face article describes real-time techniques for recognizing emotions
recognition or facial expression recognition. An Automatic Facial
Expression Recognition System (AFERS) has been suggested in and gestures. The central concept is to generate essential spots
this study. Three steps comprise the proposed methodology: face using MediaPipe architecture based on real-time deep learning
detection, feature extraction, facial expression recognition, and [4]. With the help of developments in allied disciplines,
gesture recognition. Face detection’s initial steps include YCbCr- particularly machine learning, image processing, and human
color skin colour detection, illumination correction for uniformity cognition, facial expression recognition (FER) has seen a
on the face, and morphological procedures for maintaining
the necessary face region. Using the AAM (Active Appearance significant evolution in recent years. As a result, his FER has
Model) approach, the first phase’s output is utilized to extract a more substantial influence and possible usage in automation,
facial features such as the mouth, nose, and eyes. Automatic facial particularly in areas like driver condition monitoring, human-
expression recognition, the third stage, entails straightforward. computer interface, and robot control. However, reliable ex-
Using the AAM (Active Appearance Model) approach, the first traction of important emotional information has proven to be a
phase’s output is utilized to extract facial features such as the
mouth, nose, and eyes. tough challenge, making robust facial expression identification
Index Terms—Facial Expression Recognition, Machine Learn- from photos and videos thus far. These characteristics are
ing, Deep Learning, Active Appearance Model frequently shown in various ways, including static, dynamic,
point-based geometry, and region-based appearances [5]. Ori-
I. INTRODUCTION entational movement characteristics, such as adjustments to
The limitations of computer vision have been overcome via feature location and form, are frequently brought on by the
machine learning techniques. We can now forecast people’s skeletal and muscular movements of the face during emotional
activities and follow-up actions thanks to a new method re- expression.
searchers and developers have created for identifying emotions
II. LITERATURE SURVEY
[1]. Since machine learning techniques extensively use GPU
computing capacity, the image processing capabilities of these Principal component analysis (PCA)-based eigenfaces and
models are ideally suited for real-world issues. From being fish faces are examples of holistic approaches. Local descrip-
a specialized discipline, computer vision has expanded into tors have attracted interest due to their resilience to lighting
many others, including behavioural science. Several real-world and position changes, but these techniques have been exten-
applications employ these methods or models, including safety, sively investigated. Heisele et al. [6] showed the feasibility of
driver safety, self-driving cars, human-computer interaction, component-based strategies and their advantages over holistic
and healthcare. These models are continually evolving with approaches. Descriptors are computed using a local feature
the introduction of GPUs, a kind of technology capable of algorithm that uses different regions of the face, and the
carrying out millions of computations in seconds or minutes data is compiled into one descriptor for her. The latter is a
[2]. Machine technologies such as virtual and augmented face detection application of the LBP algorithm, which was
reality are also becoming increasingly popular. The most first created for texture description [7]. LBP has grown in
attractive applications include robot vision and interactive popularity and has been the subject of in-depth research due
robot communication. Both verbal and visual modalities can to its superiority to earlier techniques. Both techniques work

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on July 26,2023 at 08:29:00 UTC from IEEE Xplore. Restrictions apply.
around noise and illumination issues using the information were perhaps the most widely utilised standard optical flow
other than intensity [8]. technologies. To deal with optical flow technology’s inherent
issues. Image sequences are used to track facial feature points
III. METHODOLOGY for automatic facial expression identification in sequential
A. Existing System state estimation algorithms (such as the particle filter and the
To recognise hand gestures, a variety of hardware and Kalman filter) [15].
sensors are available. These methods could be more precise 1) CNN Learning Algorithm: With the help of the Inception
and appropriate for the job. The system is challenging since it v3, VGG19, and ResNet50 model backbones, we have created
uses many sensors and technology [9]. Stereo cameras, which several CNN architectures. Compare the performance of the
are more costly and resource-intensive, are also a system need. two datasets and the various design models independently.
Today’s software to identify hands could be more active and The CNN architecture primarily consists of three layers. The
produce accurate results. This needs to be improved in the foundational layer of the CNN architecture is a convolutional
current system. The software that people use frequently causes layer, which is responsible for extracting the image features
them issues [10]. Traditional emotion recognition systems from the input picture. To acquire a high number of visual
classify just photos using straightforward image processing. feature findings, a convolutional layer often uses many convo-
Research uses facial feature extraction and neural networks lution kernels as filters [16]. To avoid overfitting and boost the
to identify facial expressions (happy, sad, angry, fearful, sur- model’s fault tolerance, a component known as a pooling layer,
prised, neutral, etc. [11].) also known as a downsampling layer, conducts dimensionality
reduction and data compression on the collected features. The
B. Proposed Methodology algorithm’s outcome is produced by a fully linked dense layer
Find face views, extracting facial characteristics and facial using max pooling [16].
features, fluctuating with analysis speed, and various cate-
gories for interpreting facial emotions are among the human
facial expression recognition challenges (Figure 1) from three
problem areas (e.g., emotion, facial muscle movements to
classify this information found in facial regions) [12]. It can
be viewed as Clutter, and faces should be able to tackle the
problem of non-rigid motion independent of illumination con-
ditions (face localisation, face recognition). Face form, colour,
and texture are pretty changeable with this issue—several
methods. It is possible to separate the primary challenge of
extracting face characteristics from input photos into at least Fig. 2. Data Flow diagram
three dimensions [14].
To categorise the picture, this layer integrates the feature
data from each neuron in the top layer. The layers of the CNN
architecture are kept in non-trainable mode, except the last
weighted layer, for maximum performance.
Face Landmark Model: We utilised transfer learning for
training a network with different 3D face-marking objectives.
The network forecasts 2D semantic contours for annotated
real-world data and 3D landmark coordinates for material that
has been artificially produced. The resultant network made
accurate 3D landmark predictions for simulated and actual data
[17]. The 3D landmark network receives the clipped video
Fig. 1. Detailed Process of Facial Expression
frames without extra depth information. The model outputs
The fundamental strategy is to outline the processing needed the 3D point coordinates and the likelihood that the face is
for the phase of face feature extraction—smooth textures, the present and facing in the appropriate direction for the input.
dense flow of information over the face, etc. Scientists can The traditional option is to forecast a 2D heatmap for each
convey this method with straight facial expressions, no matter landmark. However, this method has high processing costs for
if face components are present or the cheek and forehead many landmarks and is unsuitable for depth prediction. Launch
areas. As a result, the apparent optical flow travel speed is and improve your forecasts repeatedly [17].
represented in the graph [15]. Until recently, facial feature Attention Mesh Model: In addition to the facial landmark
points and forms, monitoring blemishes, and sensitivity to model, we also propose a model that focuses attention on
noise, occlusion, clutter, and variations in light accumulation semantically meaningful facial regions and more accurately

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on July 26,2023 at 08:29:00 UTC from IEEE Xplore. Restrictions apply.
predicts landmarks around the lips, eyes, and iris at the to use a dataset that has lots of uncontrolled variables. Next,
expense of additional computations. . This enables applications some variables may be changed, including position, occlusion,
such as augmented reality puppetry and augmented reality facial emotions, lighting, and facial variations. Many datasets
cosmetics [18]. have been published for testing face expression recognition
2) Palm Detection Model: For detecting initial hand place- algorithms. Online publications and paid datasets both exist.
ment, we created a single-shot detector model designed for The learner is given preprocessed pictures for some datasets.
real-time mobile applications and similar to the face detection People have various example photos in the collection. FERET,
model of the Media Pipe Face Mesh. The task of hand Extended YaleB, CMU-PIE, AR, Cohn Kanade, ORL, Indian
detection is challenging. Both the light model and the complete face database, and JAFEE, a dataset of Japanese female facial
model should be able to recognise self- and occluded hands, expressions, are a few examples of these datasets. Her FERET
as well as a variety of hand sizes, over significant scale spans face database, as well as her CMU (PIE) postures, lighting, and
(¿20x) relative to the image frame. The design on the face is presentations, are made up of all of them [22]. The database
highly contrasted. The hands lack comparable characteristics is the de facto standard and openly attacks other issue areas.
around the eyes and mouth, making it more challenging to In contrast to the FERET database, there are other publicly
tell them apart only from visual cues. The precise placement available expression databases, including the Cohn-Kanade
of the hand can be determined by adding more information, database, also known as the CMU-Pittsburg AU encoded
such as the subject’s arms, torso, and facial features [19]. database. This excludes natural emotions and includes posing
Our approach addresses the problems mentioned above in expressions. The AR Face Database and the Japanese Female
various ways. Let’s train the palm detector first instead of Face Expression Database are comparable posture expression
the hand detector. This is so that it is far simpler to estimate datasets (JAFFE).
the bounding box of a stiff item, like a palm or fist than to
recognise a hand with moving fingers. Furthermore, because IV. RESULT AND DISCUSSION
the palm is a small item, the non-maximum suppression tech- In this research, we presented a unique hand gesture recog-
nique is effective in two-handed self-occlusion scenarios like nition method that combines deep learning methods with
handshakes. Again, 3-5 anchors can be saved by simulating geometry algorithms to solve tasks like fingertip and gesture
palm trees with square bounding boxes (known as anchors detection. This method demonstrated both its applicability for
in the ML language), disregarding other aspect ratios. Then, real-world use and exact gesture estimation. The suggested
the encoder/decoder feature extractors are utilised to take into approach provides a lot of benefits. For instance, it identifies
account the context of the broader image, even for small items hand motions correctly from a distance and in various lighting
(similar to the RetinaNet approach) [20]. Last but not least, and backdrop situations, as well as from many persons.
we minimise concentration loss throughout training to support According to experimental data, this strategy is a promising
several anchors. High-scale dispersion is what led to this. method for real-time hand-gesture-based interfaces. The sys-
With the method mentioned earlier, we can detect palms with tem will be expanded to accept different hand movements in
an average accuracy of 95.7%. The baseline is only 86.22% future work on hand gesture recognition, and the technique
when no decoder is used, and cross-entropy loss is typical. will be used in other more valuable applications.
In addition to the facial landmark model, we also propose a The following lists the suggested system’s performances as
model that focuses attention on semantically meaningful facial determined using PCA analysis. To determine how the data is
regions and more accurately predicts landmarks around the dispersed and how characteristics relate to one another, a few
lips, eyes, and iris at the expense of additional computations. . charts and graphs are produced.
This enables applications such as augmented reality puppetry 1. It is most frequently a description of systematic mistakes,
and augmented reality cosmetics [18]. a gauge of statistical bias; as these differences between results
Landmark Model: After recognizing palms across the pic- and ”actual” values are what ISO refers to as ”trueness,”
ture, the next hand orientation model employs regression 2. Alternatively, ISO defines accuracy as the sum of those
to carry out accurate keypoint localization of the 21 3D above systematic and random observational errors. Thus high
knuckle coordinates inside the observed hand areas. The model accuracy necessitates both high precision and high trueness.
continually portrays the inner hand posture, even with her The training accuracy shows what proportion of the photos
hands only partially visible and her self-occlusion [21]. To used in the current training batch has the correct class labels
collect ground truth information, 30,000 actual photographs applied. The proportion of correctly classified pictures on
of the natural environment were manually tagged with 21 3D randomly chosen images from a separate collection is known
coordinates (for each corresponding point, Z values from the as validation accuracy.
picture depth map, if present). To better cover the range of According to the demo output in figure 9, the model is
hand positions and offer different varieties of hand geometry, capable of predicting users’ moods in real time.
it also develops a high-quality synthetic hand model in various
backdrops. It maps it to its corresponding 3D coordinates. V. CONCLUSION AND FUTURE WORK
Dataset: Scientific data sets on various subjects are needed This article focused on validating the framework for facial
for practical work on facial expression recognition. It is wise expression recognition. The various elements and steps neces-

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on July 26,2023 at 08:29:00 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Comparison of Training and Validation Accuracy

Fig. 3. Classification Results

Fig. 7. Model Training Loss

Fig. 4. Correlation HeatMap

Fig. 8. Model Training Loss and Accuracy

Fig. 5. Confusion Matrix

sary for facial expression recognition have been covered. The


method of recognising facial expressions was used to illustrate
multiple faces. Facial expression recognition in recent research Fig. 9. Emotions Predictions
by several authors is featured. Finally, we discuss a collection

of data that is crucial to this field of study. It also provides


an array of numerous datasets on facial expressions that are
often utilised in research. This post shows how to use hand

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on July 26,2023 at 08:29:00 UTC from IEEE Xplore. Restrictions apply.
[4] P. K. Sidhu, A. Kapoor, Y. Solanki, P. Singh and D. Sehgal, “Deep
Learning Based Emotion Detection in an Online Class,” 2022 IEEE
Delhi Section Conference (DELCON), New Delhi, India, 2022, pp. 1-6,
doi: 10.1109/DELCON54057.2022.9752940.
[5] F. Xu and W. -B. Run, “Bi-modal Emotion Recognition via Broad
Learning System,” 2021 China Automation Congress (CAC), Beijing,
China, 2021, pp. 2143-2148, doi: 10.1109/CAC53003.2021.9727610.
[6] E. Ghaleb, A. Mertens, S. Asteriadis and G. Weiss, “Skeleton-Based
Explainable Bodily Expressed Emotion Recognition Through Graph
Convolutional Networks,” 2021 16th IEEE International Conference on
Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India,
2021, pp. 1-8, doi: 10.1109/FG52635.2021.9667052.
[7] A. V. Atanassov, D. I. Pilev, F. N. Tomova and V. D. Kuzmanova,
“Hybrid System for Emotion Recognition Based on Facial Expressions
and Body Gesture Recognition,” 2021 International Conference Auto-
matics and Informatics (ICAI), Varna, Bulgaria, 2021, pp. 135-140, doi:
10.1109/ICAI52893.2021.9639829.
[8] M. Mukhopadhyay, A. Dey, R. N. Shaw and A. Ghosh, “Facial emotion
recognition based on Textural pattern and Convolutional Neural Net-
work,” 2021 IEEE 4th International Conference on Computing, Power
Fig. 10. Demo model output plot
and Communication Technologies (GUCON), Kuala Lumpur, Malaysia,
2021, pp. 1-6, doi: 10.1109/GUCON50781.2021.9573860.
tracking and extraction algorithms to identify unfamiliar input [9] M. Li, L. Chen, M. Wu, W. Pedrycz and K. Hirota, “Multimodal
movements. We use this method to identify individual motions. Information-Based Broad and Deep Learning Model for Emotion Un-
Assuming a steady background during our studies causes the derstanding,” 2021 40th Chinese Control Conference (CCC), Shanghai,
China, 2021, pp. 7410-7414, doi: 10.23919/CCC52363.2021.9549897.
algorithm to follow a smaller search region. Using this idea, [10] A. Espinel, N. Pérez, D. Riofrı́o, D. Benı́tez and R. F. Moy-
I created a programme that lets you put your finger on your ano, “Face Gesture Recognition Using Deep-Learning Models,” 2021
camera to operate the mouse. This technology is also a novel IEEE Colombian Conference on Applications of Computational Intel-
ligence (ColCACI), Cali, Colombia, 2021, pp. 1-6, doi: 10.1109/Col-
method of recognising hand gestures. It uses deep learning CACI52978.2021.9469528.
approaches in conjunction with geometry-his algorithms and [11] M. -H. Hoang, S. -H. Kim, H. -J. Yang and G. -S. Lee, “Context-
deaf I’m here to accomplish gesture and fingertip identification Aware Emotion Recognition Based on Visual Relationship Detection,”
in IEEE Access, vol. 9, pp. 90465-90474, 2021, doi: 10.1109/AC-
jobs. This method demonstrated both its applicability for real-
CESS.2021.3091169.
world use and exact gesture estimation. The suggested [12] N. Naik and M. A. Mehta, “Hand-over-Face Gesture based Facial Emo-
approach provides a lot of benefits. In the future, interviewers tion Recognition using Deep Learning,” 2018 International Conference
will have resources to determine better how potential HR on Circuits and Systems in Digital Enterprise Technology (ICCSDET),
Kottayam, India, 2018, pp. 1-7, doi: 10.1109/ICCSDET.2018.8821186.
candidates would respond to inquiries from various contexts. [13] P. Duraisamy, Y. Natarajan, E. N L and J. R. P, “Efficient Way of Heart
This technology enables the use of sign language by allowing Disease Prediction and Analysis using different Ensemble Algorithm: A
for the real-time recognition of landmarks on the hand. Addi- Comparative Study,” 2022 6th International Conference on Electronics,
Communication and Aerospace Technology, Coimbatore, India, 2022,
tionally, by utilising this project, ongoing initiatives like those pp. 1425-1429, doi: 10.1109/ICECA55336.2022.10009569.
that detect driver intoxication, identify behavioural patterns, [14] Z. Shen, J. Cheng, X. Hu and Q. Dong, “Emotion Recognition Based
etc., may be deployed more quickly and effectively. In this on Multi-View Body Gestures,” 2019 IEEE International Conference
research, we created a system that, using CNN algorithms, on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 3317-3321, doi:
10.1109/ICIP.2019.8803460.
can detect photos in real-time based on attributes taken from [15] A. Espinel, N. Pérez, D. Riofrı́o, D. Benı́tez and R. F. Moy-
a training database. A graphical user interface (GUI) that the ano, “Face Gesture Recognition Using Deep-Learning Models,” 2021
general public may quickly access and utilise in real-time is IEEE Colombian Conference on Applications of Computational Intel-
ligence (ColCACI), Cali, Colombia, 2021, pp. 1-6, doi: 10.1109/Col-
implemented in Phase 2 using OpenCV. CACI52978.2021.9469528.
[16] H. Kishan Kondaveeti and M. Vishal Goud, “Emotion Detection using
REFERENCES Deep Facial Features,” 2020 IEEE International Conference on Advent
Trends in Multidisciplinary Research and Innovation (ICATMRI), Buld-
hana, India, 2020, pp. 1-8, doi: 10.1109/ICATMRI51801.2020.9398439.
[1] D. Gandhi, K. Shah and M. Chandane, “Dynamic Sign Language Recog-
[17] M. Karna, D. S. Juliet and R. C. Joy, “Deep learning based Text
nition and Emotion Detection using MediaPipe and Deep Learning,”
Emotion Recognition for Chatbot applications,” 2020 4th In- ternational
2022 13th International Conference on Computing Communication and
Conference on Trends in Electronics and Informat- ics
Networking Technologies (ICCCNT), Kharagpur, India, 2022, pp. 1-7,
(ICOEI)(48184), Tirunelveli, India, 2020, pp. 988-993, doi:
doi: 10.1109/ICCCNT54827.2022.9984592.
10.1109/ICOEI48184.2020.9142879.
[2] Z. Li, X. Zhao, Y. Yang, Q. Gao and Y. Song, “HVFM: an Emotion
[18] J. J. Deng, C. H. C. Leung, P. Mengoni and Y. Li, “Emotion Recog-
Classification Model based on Horizontal and Vertical Flow Domain-
nition from Human Behaviors Using Attention Model,” 2018 IEEE
adaptive,” 2022 IEEE International Conference on Mechatronics and
First International Conference on Artificial Intelligence and Knowledge
Automation (ICMA), Guilin, Guangxi, China, 2022, pp. 455-460, doi:
Engineering (AIKE), Laguna Hills, CA, USA, 2018, pp. 249-253, doi:
10.1109/ICMA54519.2022.9856194.
10.1109/AIKE.2018.00056.
[3] Y. -X. Wang, Y. -K. Li, T. -H. Yang and Q. -H. Meng, ”Multi-
[19] G. Subramanian, N. Cholendiran, K. Prathyusha, N. Balasubramanain
task Touch Gesture and Emotion Recognition Using Multiscale Spa-
and J. Aravinth, “Multimodal Emotion Recognition Using Different
tiotemporal Convolutions With Attention Mechanism,” in IEEE Sensors
Fusion Techniques,” 2021 Seventh International conference on Bio
Journal, vol. 22, no. 16, pp. 16190-16201, 15 Aug.15, 2022, doi:
Signals, Images, and Instrumentation (ICBSII), Chennai, India, 2021,
10.1109/JSEN.2022.3187776.
pp. 1-6, doi: 10.1109/ICBSII51839.2021.9445146.
[20] M. Mohammadpour, H. Khaliliardali, S. M. R. Hashemi and M. M.
AlyanNezhadi, “Facial emotion recognition using deep convolutional

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on July 26,2023 at 08:29:00 UTC from IEEE Xplore. Restrictions apply.
networks,” 2017 IEEE 4th International Conference on Knowledge-
Based Engineering and Innovation (KBEI), Tehran, Iran, 2017, pp. 0017-
0021, doi: 10.1109/KBEI.2017.8324974.
[21] H. Ranganathan, S. Chakraborty and S. Panchanathan, “Multimodal
emotion recognition using deep learning architectures,” 2016 IEEE
Winter Conference on Applications of Computer Vision (WACV), Lake
Placid, NY, USA, 2016, pp. 1-9, doi: 10.1109/WACV.2016.7477679.
[22] L. Chen, M. Wu, W. SU and K. Hirota, “Multi-Convolution Neural
Networks-Based Deep Learning Model for Emotion Understanding,”
2018 37th Chinese Control Conference (CCC), Wuhan, China, 2018,
pp. 9545-9549, doi: 10.23919/ChiCC.2018.8483607.

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on July 26,2023 at 08:29:00 UTC from IEEE Xplore. Restrictions apply.

You might also like