Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views54 pages

Human Abnormality Classification Report

Uploaded by

Anonymous ZamaN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views54 pages

Human Abnormality Classification Report

Uploaded by

Anonymous ZamaN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Human Abnormality Classification using Combined

CNN-RNN Approach

By
Md. Mohsin Kabir, ID:16172103218
Farisa Benta Safir, ID:16172103049
Saifullah Shahen, ID:16172103186
Jannatul Maua, ID:16172103291
Iffat Ara Binte Awlad, ID:16172103054

Submitted in partial fulfillment of the requirements of the degree of

Bachelor of Science in

Computer Science and Engineering

Department of Computer Science and Engineering

Bangladesh University of Business and Technology

May 2021
Declaration

We do hereby declare that the research works presented in this thesis entitled

“Human Abnormality Classification using Combined CNN-RNN Approach”

are the results of our own works. We further declare that the thesis has

been compiled and written by us. No part of this thesis has been submitted

elsewhere for the requirements of any degree, award or diploma, or any other

purposes except for publications. The materials that are obtained from

other sources are duly acknowledged in this thesis.

Md. Mohsin Kabir


ID: 16172103218 Signature

Farisa Benta Safir


ID: 16172103049 Signature

Saifullah Shahen
ID: 16172103186 Signature

Jannatul Maua
ID: 16172103291 Signature

Iffat Ara Binte Awlad


ID: 16172103054 Signature

ii
Approval

We do hereby acknowledge that the research works presented in this thesis

entitled ”Human Abnormality Classification Using Combined CNN-RNN

Approach” result from the original works carried out by Dr. Muhammad

Firoz Mridha, Chairman and Associate Professor, Department of Computer

Science and Engineering, Bangladesh University of Business and Technology.

We further declare that no part of this thesis has been submitted elsewhere

for the requirements of any degree, award or diploma, or any other purposes

except for publications. We further certify that the dissertation meets

the requirements and standard for the degree of Doctor of Philosophy in

Computer Science and Engineering.

Dr. Muhammad Firoz Mridha


Chairman and Associate Professor
Department of CSE

iii
Acknowledgement

We would like to express our heartfelt gratitude to the almighty Allah who

offered upon our family and us kind care throughout this journey until the

fulfilment of this research.

Also, we express our sincere respect and gratitude to our supervisor Dr.

Muhammad Firoz Mridha, Chairman and Associate Professor, Department

of Computer Science and Engineering, Bangladesh University of Business

and Technology(BUBT). Without his guidance, this research work would

not exist. We are grateful to him for his excellent supervision and for putting

his utmost effort into developing this project. We owe him a lot for his

assistance, encouragement, and guidance, which has shaped our mentality

as a researcher.

Finally, we are grateful to all our faculty members of the CSE department,

BUBT, to make us compatible to complete this research work with the proper

guidance and supports throughout the last four years.

iv
Abstract

Facial Expression Recognition (FER) has become a promising area in the

Deep Learning domain with the advent of big data. The facial expression

reflects our mental activities and provides valuable information on human

behaviours. With the increasing improvement of the deep learning-based

classification method, particular demands for human stability measurement

using facial expression have emerged. Recognizing human abnormalities

such as drug addiction, autism, criminal mentality, etc., are quite challenging

due to the limitation of existing FER systems. Besides, there are no existing

datasets that consist of helpful images that describe the human face’s

genuine expressions that can detect human abnormalities. To achieve the

best performance on human abnormality recognition, we have created a

Normal and Abnormal Humans Facial Expression (NAHFE) dataset. This

thesis paper proposes a new model by stacking the Convolutional Neural

Network. The proposed combined method consists of convolution layers

followed by the recurrent network. The associated model extracts the

features within facial portions of the images, and the recurrent network

considers the temporal dependencies which exist in the images. The proposed

combined architecture has been evaluated based on the mentioned NAHFE

dataset, and it has achieved state-of-the-art performance to detect human

abnormalities.

v
List of Tables

4.1 The table shows the validation Accuracy, Precision, and


Recall of the proposed CNN-RNN combined approach and
basic CNN architecture. . . . . . . . . . . . . . . . . . . . . 29
4.2 The table shows the validation Accuracy, Precision, and
Recall of the proposed CNN-RNN combined approach based
on different hidden units. . . . . . . . . . . . . . . . . . . . . 29
4.3 The table shows the validation Accuracy, Precision, and
Recall of the proposed CNN-RNN combined approach based
on different hidden layers. . . . . . . . . . . . . . . . . . . . 30

vi
List of Figures

1.1 Flow of the work . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Workflow of the proposed system . . . . . . . . . . . . . . . 18


3.2 Convolutional neural network architecture . . . . . . . . . . 22
3.3 Proposed CNN-RNN combined approach . . . . . . . . . . . 23

4.1 Sample images of the created dataset . . . . . . . . . . . . . 26

7.1 Gantt chart of work execution . . . . . . . . . . . . . . . . . 37

vii
List of Abbreviations

FER Facial Expression Recognition

NAHFE Normal and Abnormal Humans Facial Expression.

GPU Graphics Processing Unit.

ANN Artificial Neural Network.

DL Deep Learning

CNN Convolutional Neural Network.

RNN Recurrent Neural Network

DCNN Deep Convolutional Neural Network.

ReLU Rectified Linear Activation Function.

viii
Contents

Declaration ii

Approval iii

Acknowledgement iv

Abstract v

List of Tables vi

List of Figures vii

List of Abbreviations viii

1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Background . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Flow of the Research . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Significance of the Research . . . . . . . . . . . . . . . . . . 5
1.8 Research Contribution . . . . . . . . . . . . . . . . . . . . . 6
1.9 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 6
1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . . 16

viii
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Proposed Model 17
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Feasibility Analysis . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Requirement Analysis . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Research Methodology . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Data Pre-processing . . . . . . . . . . . . . . . . . . 19
3.4.2 Convolutional Neural Network . . . . . . . . . . . . . 20
3.4.3 Recurrent Neural Network . . . . . . . . . . . . . . . 21
3.4.4 Combined CNN-RNN Architecture . . . . . . . . . . 22
3.5 Design, Implementation, and Simulation . . . . . . . . . . . 24
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Implementation, Testing, and Result Analysis 25


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 System Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 28
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Standards, Constraints, Milestones 31


5.1 Standards (Sustainability) . . . . . . . . . . . . . . . . . . . 31
5.2 Impacts (on Society) . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.5 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.6 Timeline and Gantt Chart . . . . . . . . . . . . . . . . . . . 34
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Conclusion 37
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 Future Works and Limitations . . . . . . . . . . . . . . . . . 37

ix
Introduction

1.1 Introduction

The human face is exceptionally imaginative, able to notify innumerable

emotions without accents a word. Facial Expression Recognition (FER) is a

fundamental problem in the field of computer vision and image processing

[1, 2]. Facial expressions differ from person to person and are also influenced

by gender, age, ethnicity, etc. As a whole, the facial expression constitutes

the real feelings of a human being. So FER has become more pledging

because of its human behaviour analytical power. Hence, using FER’s

analytic capability, we have developed a combined method of Convolutional

Neural Network (CNN) [3] and Recurrent Neural Network (RNN) [4] to

classify human abnormalities. This approach analyses the human face and

finds the irregularities, such as Drug addiction, Autism, Criminalism. The

analysis of facial expressions is a laborious task for Deep Learning (DL) [5]

approaches because human beings can deflect the way they express their

faces significantly [6, 7]. The convolutional neural network is a part of

deep learning, mostly used for image analysis and image processing tasks

[7, 8]. Deep CNNs have gained momentous success and have been explicitly

1
proven well suited for image recognition tasks from massive datasets [9, 10].

Recent CNN architectures employ several ways to shorten the training time

and enhance generalization over input data, including data augmentation

[11], dropout regularization [12], ReLU activation functions, and GPU

acceleration [9]. RNN has been getting popular exponentially because RNN

not only assesses its input(s) momentarily but also its evaluation lies on the

past input(s) [12, 13]. Thus, the result is generated from a composition of

information coming from the past and present. Hence, in this work, we have

developed an architecture that uses both CNN and RNN to classify human

abnormalities.

1.2 Problem Statement

Analysing facial expression and the manifestation of actual human emotions

is one of computer vision’s promising areas. Facial expression analysis

becomes more pledging because of its human behaviour analytical power.

Real-time facial expression analysis and finding facial pattern has remained

a challenging and exciting problem in computer vision. Generating a useful

pattern of facial expressions is a complex problem for deep learning tech-

niques since people can vary significantly in the way they show their faces.

In this work, we design a combined model to generate significant human face

patterns to detect abnormalities, such as drug addicts, criminals, autistic,

and regular people. The datasets available on the internet are not bringing

to perfection because of these classified basic expressions of the human face.

Therefore, the world has never seen such human abnormalities detection

architectures.

2
1.3 Problem Background

The present Facial Emotion Recognizer system only classifies human emo-

tions like sadness, happiness, anger, fear, disgust, etc. Identifying human

abnormalities is a real-world computer vision problem and yet to solve.

Recognizing human abnormalities such as drug addicts, autistic, criminals

is very challenging due to the limitation of existing FER systems. Most

importantly, there is no available dataset of autistic, criminal and drug

addict human faces. Lacking a proper dataset is the critical challenge of our

research. We are looking forward to working with the newly created dataset

of our own.

1.4 Research Objectives

The objectives of our research work are as follows:

• Identifying present difficulty in human abnormalities classification.

• Creating a new dataset named Normal and Abnormal Humans Facial

Expression (NAHFE).

• Reducing computational complexity for human abnormalities classifi-

cation.

• Developing a new combined CNN-RNN approach to classify human

abnormalities accurately.

• Comparing the existing architectures with the proposed one for the

abnormalities classification task.

3
1.5 Motivations

Facial expression analysing or emotion estimation has attracted significant

attention in computer vision researchers during the past decade since it lies

in the intersection of many critical applications, such as crowd analytics,

human-computer interaction, surveillance, etc. Modern researchers are

trying to develop the best approaches to address these problems. Enormous

research work has already done that gives us reasonable solutions for few FER

problems. However, no specific architecture has developed yet that identify

human abnormalities, which significantly impacts our society described in

section 5.2. However, to overcome the situations, we built a dataset named

Normal and Abnormal Humans Facial Expression (NAHFE) dataset. Finally,

a combined CNN-RNN approach was applied to the NAHFE dataset and

achieve a promising result to identify human abnormalities.

1.6 Flow of the Research

The research work is developing into several steps. First, we have analysed

the research topics and then studied the basic theory of facial expression

recognition. Then we have investigated the application of FER. We inves-

tigated the lack of present architectures and motivated them to build a

new architecture based on state-of-the-art deep learning approaches. Figure

1.1 illustrates the overall steps to the research procedure in the following

diagram.

4
Figure 1.1. The figure illustrates the flow of the thesis work.

1.7 Significance of the Research

We observe that most of the Facial Expression Recognition (FER) image

datasets on the web are built for classifying six or seven primary expressions

of the human face. However, to analyse human stability, we needed a dataset

divided into four classes: Drag addiction, Autism, Criminalist, and Normal.

5
Therefore this study introduces a new dataset to classify human stability into

four categories. Also, we have proposed a combined CNN-RNN architecture

which gives the best result. This human stability classification problem

becomes a state-of-the-art architecture to estimate human expression and

significantly impact society and the country.

1.8 Research Contribution

The overall contribution of the research work are:

• Identifying present difficulty to analyse human abnormalities in Facial

Expression Recognition (FER) problems.

• A new dataset named Normal and Abnormal Humans Facial Expression

(NAHFE) dataset consists of 1936 images of 4 different classes.

• A novel CNN-RNN combined approach to classifying human abnormal-

ities is proposed and got a convincing result. The CNN-RNN combined

approach is unique and believed to have enormous potentials. Also, the

impact of the proposed CNN-RNN combined architecture is compared

to basic CNN architecture.

1.9 Thesis Organization

The thesis work is organised as follows. Chapter 2 highlights the background

and literature review on the field of the Facial Expression Recognition(FER)

system. Chapter 3 contains the human abnormality classification system’s

proposed architecture and a detailed walk-through of the overall procedures.

6
Chapter 4 includes the details of the tests and evaluations performed to

evaluate our proposed architecture. Chapter 5 explains the Standards,

Impacts, Ethics, Challenges, the Constraints, Timeline, and Gantt Chart.

Finally, Chapter 6 contains the overall conclusion of our thesis work.

1.10 Summary

This chapter includes a broad overview of the problem that we aimed

explicitly at our research work’s objectives, the background, and the research

work’s motivation. This chapter also illustrates the overall steps on which

we carried out our research work.

7
Background

2.1 Introduction

Facial expression exploration or facial effect analysis has inducted significant

computer vision researchers attention during the past few years. The

majority of existing approaches focus on classifying seven basic expressions,

which have been universal across cultures and subgroups: neutral, happy,

surprised, fear, angry, sad, and disgusted. However, before this study, no

work has introduced on human abnormalities investigation by analysing

human expressions. In this paper, We have presented a fragment of some of

the work enforced by the researchers.

2.2 Literature Review

Facial expression recognition or analysis of facial effect analysis has inducted

significant attention in computer vision researchers during the past few years

[14, 15]. Popular approaches classify seven basic expressions include happy,

sad, surprised, disgusted, fear, angry, and neutral but no prior work on

classifying human abnormalities has been done yet.

8
Takalkar et al. [16] analysed the use of deep learning for micro-expression

classification. Using data augmentation, authors generated massive datasets

of synthetic images named CASME and CASME 2 databases. Finally,

A CNN-based micro-expression recognizer is developed by combining and

tuning both of these datasets, which gave a maximum of 78.2% accuracy.

Jung et al. [17] proposed two deep network models using CNN and DNN

for the FER problem. Using the FER 2013 database, they achieved 72.78%

accuracy for DNN architectures and 86.45% for CNN. The authors first

detected faces from input images by Haar-Like features and then applied

the Deep Learning model.

Xiujie Qu et al. [18] proposed a real-time and fast face recognition

model using CNN. The authors divide the process into two parts. First, the

network is trained on the PC, and then the network is implemented on the

Field Programmable Gate Array (FPGA) and got 99.25% accuracy which is

state-of-the-art.

Neha Jain et al. [19] proposed the face emotion recognition model, a

combination of deep CNN and RNN models. In this model, the author used

two datasets: MMI Facial Expression Database (FED) and the Japanese

Female Facial Expression (JAFFE). In this model, 80% of datasets are used

for training, and 20% of datasets are used for validation. Achieve 94.91%

accuracy when used with JAFFE datasets and achieve 92.07% accuracy

when using MMI datasets.

Fathallah et al. [14] experimented that CNN is very effective while

recognizing facial expressions. Authors fine-tuned CNN architecture with

the VGG model and trained famous datasets like CK+, MUG, and got

9
nearly 99% accuracy which is state-of-the-art.

Ali et al. [7] proposed deep neural network architecture presented for

automated facial expression, which has two convolutional layers. One is max

pooling, another is four inception layers, and firstly applies the inception

layers. The proposed approach takes a facial image as input and classifies

that image into 6-expression or neutral. The author used different databases

such as Multi-PIE, MMI, CK+, DISFA, FERA, SFEW, and FER2013

and achieved 94.7%, which is the best accuracy using the CMU Multi-PIE

database.

Osamah M. Al-Omair et al. [20] compares cloud-based recognition

services among Google, Amazon, and Microsoft. They provide the best

output services for their dataset. They use 140 images as a dataset. As

a result, in overall recognition, Microsoft Azure’s Face API is the best.

However, in recognition accuracy, google does its best job. Amazon always

performs average all time.

Saypadith et al. [21] recommended a formation for Recognizes multiple

faces in real-time, which is effective on the NVlDIA Jetson TX2 board.

Authors got suitable and faster processing time for facial expression recogni-

tion. On the CUFace dataset, this CNN base architecture got nearly 90%

accuracy.

Xiaoming Zhao et al. [1] proposed a face emotion recognition model, a

combination of DBNs and MLP using the JAFFE dataset and Cohn Kanade

dataset. Using the proposed DBNs + MLP method, achieve the highest

accuracy of 90.95% (use 64 X64 images) when using the JAFFE database

and perform the highest 98.57% (use 64 X64 images) when using the Cohn

10
Kanade database.

Hamid Ouanan et al. [22] proposed a deep learning based, large-scale

face recognition approach and also created a real-time application. Also,

they use the PUBALL dataset, which their creation and author achieve

98.12% accuracy using this method.

Sang et al. [23] proposed a compelling architecture that solves facial

expression recognition using a stack of convolutional blocks. Here FERC

2013 dataset, this model (BKVGCi) got the highest accuracy of 7% for

public test and 7.9% for a private test.

André Teixeira Lopes et al. [24] recommended neural network archi-

tecture for automated facial expression recognition using 7 ( MMI, CK+,

DlSFA, FERA, SFEW FER2013, MultiPlE) database images. Moreover,

they also used pre-processing techniques to solve this method. Get best

98.92% accuracy when using 6 -expression (anger, sad, surprise, happy,

disgust, and fear)—Using CK+ database.

Deepak Kumar Jain et al. [25] designed a single Deep Convolutional

Neural Network for classifying one of six emotional states. This model

trained on two famous datasets CK+ and JAFFE and outperformed the

previous state-of-the-art result of facial expression recognition.

Jinwoo Jeon et al. [26] suggested a facial expression recognizer using

the HOG feature descriptor, which is used to detect human faces and CNN

to classify expressions. Training the model on Kaggle Facial Expression

Recognition datasets, the authors got high accuracy with low computation

time.

H. Hiranmayi Ranganathan et al. [27] applied convolutional deep belief

11
network (CDBN) models for Multimodal Emotion Recognition. They also

present the emoFBVP database, which they created and get an expected

result of 83.18% when using the emoFBVP database.

Sun et al. [28] proposed a new face detection scheme using deep learning

and achieved state-of-the-art detection performance on well-known FDDB

face detection benchmark evaluation. They improved the Faster RCNN

framework by combining several methods.

Hong-Wei Ng et al. [29] show the result of applying transfer learning on

small datasets for facial expression recognition. The author’s best submission

presented an overall accuracy of 45.5% in the validation set and 55.6% in

the test set.

AbdAlmageed et al. [30] represent Deep Convolutional neural network

models to generate multiple pose-specific features for face recognition and

achieve better results than the state-of-the-art on IARPA’s CS2 and NIST’s

IJB-A in both verification and identification tasks. They have used CASIA-

WebFace [21] for training and both IJBA [18] and IARPA’s Janus CS2 for

evaluation.

Yanan Guo et al.[31] presented a deep learning method with Relativity

Learning (DNNRL), which directly learns a mapping from original images

to a Euclidean space. Experiments on two representative facial expression

datasets (FER-2013 and SFEW 2.0) are performed to demonstrate the

robustness and effectiveness of DN NRL.

Tolga Soyata et al. [32] work with basic functionalities of MOCHA and

develop algorithms to minimize response time for face recognition in the

cloud platform. They use mobile, capture them, and send them to the

12
cloudlet and use it as a random image. As a result, by increasing cloud

servers, performance decreases due to extreme compute strain on the mobile

device. However, a 50W power budget and cost of under $100 can be two

to three magnitude orders more efficient compared to mobile devices—the

same equipment using the Greed v algorithm boost up to 2x performance

with 13 cloud servers.

K. Shailaja and Dr. B. Anuradha [33] try to improve Huang’s LDRC

algorithm using deep learning using LC DRC. They try to reduce the national

level of understanding and provide an efficient system of error red. They use

400 face images from 40 individuals of 10 different face actions. These are

tested with the help of BCRE and WCRE results. Using a Deep learning

algorithm, it performs 87% in ORL face datasets higher than normal LDRC.

De Silva et al. [34] proposed a new basis function called cloud basis

functions (CBFs), a novel modified version of RBFs. This archive is for

recognizing holistic facial expressions from a static facial image dataset.

Performance evaluations were conducted using a grayscale static facial

images database that included images taken from the Carnegie Mellon

University facial expression image database and their database. The CBF

neural network-based classifier yielded an accuracy of 96.1%.

Hossain et al. [35] propose an edge-cloud-based privacy-preserved au-

tomatic emotion recognition system. For the experimental proposal, they

have used the RML database and eNTERFACE’05 database. The highest

recognition accuracies by the proposed model on the two databases were

82.3% and 87.6%, respectively.

Zeng et al. [36] proposed a novel framework for facial expression recog-

13
nition to distinguish the expressions with high accuracy automatically.

Furthermore, they established a Deep Sparse Auto-Encoder (DSAE) to

recognize facial expressions with high accuracy by learning robust and dis-

criminative features from the data. The experiment used the d Cohn-Kanade

(CK+) database and achieved high recognition accuracy of 95.79%.

Laine et al. [37] presented a real-time deep learning framework for

video-based facial performance capture using a Deep Convolutional Neural

Network. They have used their dataset and compared their results with

multiple state-of-the-art solutions.

Md. Zia Uddin et al. [38] extract valuable features from depth faces

which are further combined with deep learning and recognition with Modified

Local Directional Patterns (MLDP). They convert a depth image to a 6-bit

binary code. This value is calculated by its relative edge strengths in eight

directions. As a result, the recognition rate is getting higher from 91.26%

to 96.25% using both RGB and depth cameras.

Panagiotis Tzirakis et al. [39] work on emotion recognition for various

styles of speaking using CNN. They propose a multimodal system that

operates on the raw signal and performs end-to-end emotion prediction tasks

from speech and visual data. For this, they use 96X96 images, 30 videos,

and 25 audios in the Remote COLlaborative d RECOLA database. They

have 75, 150, 300 layers for speech and visual models. The best value of the

speech model is 150 layers, while the visual model was 300 layers. Firstly,

they extract facial landmarks using the face alignment method by Eng et al.

and perform a Pmcreates alignment. After concatenation, these features,

input to the recurrent model, train only the recurrent network. Their system

can predict more accurately using those features.

14
Md. Zia Uddin et al. [38] improve Directional Position Pattern (DPP)

features to utilize General Discriminant Analysis (GDA) after applying in

principal Component Analysis (PCA). They use RGB cameras and capture

through intensities very rapidly due to illumination changes in the scene, but

distance-based capture does not work correctly. They use 120 videos, where

40 videos have ten face reactions (Anger, Happy, Sad etc.). The average

recognition rate using PCA with HMM on depth faces 58% and PCA-LDA

with FER. The average result is 61.50%. After applying ICA and HMM,

the resulting increase to 80.50%. They add a feature LDPP-PCAGDA, and

the result is 92.50%.

Tian et al. [40] proposed a novel deep feature fusion convolution neural

network (CNN) for 3D Facial Expression Recognition (FER). Each 3D face

scan is represented as 2D facial attribute maps (including depth, standard,

and shape index values). The authors then combine different facial attribute

maps with learning facial representations by fine-tuning a pre-trained deep

feature fusion CNN subnet trained from a large-scale image dataset for

universal visual tasks. They reach an accuracy level of 79.17% by using this

Deep Feature Fig CNN.

To recognize facial expression from candid, non-posed images, Wei li

et al. [41] proposed a deep-learning-based approach using convolutional

neural networks (CNNs). Their experiments show that the CNN-based

approach is very effective in candid image expression recognition, significantly

outperforming the baseline approaches. The authors have got 81.5% accuracy

for CIFE datasets.

However, all these FER techniques only analysed the six or seven basic

expressions but did not give any solution to human stability identification.

15
In this paper, we have discussed how the CNN-RNN combined approach

solves this issue.

2.3 Problem Analysis

Facial expression recognition or facial effect analysis has inducted significant

attention in computer vision researchers during the past few years. These

FER techniques only analysed the six or seven basic expressions: happy,

sad, surprised, disgusted, fear, angry, and neutral, but no prior work on

classifying human abnormalities has been done yet. In this thesis work, we

have discussed how the CNN-RNN combined approach solves this issue.

2.4 Summary

This chapter investigated and reviewed the latest techniques of facial ex-

pression recognition systems, including the drawbacks. The thesis’s target

is to eliminate the imperfections as much as possible and introduced a new

combined approach to investigate human abnormalities.

16
Proposed Model

3.1 Introduction

In this section, we uphold the feasibility analysis of human abnormalities

identification by analysing facial expression and the requirements demanded

in this structure. Finally, this chapter illustrates the model’s overall archi-

tecture, which is given by a detailed explanation.

3.2 Feasibility Analysis

This research work required five researchers with one supervisor and took

nine months to be evolved. The thesis work required technical support

including, hardware and software. The research work also required a dataset

generation and evaluation process that is also executed by the researchers.

The immense data collection of the work is executed, considering the legal

feasibility of the dataset. Also, the thesis work did not require any financial

support from the institution and supervisor.

17
3.3 Requirement Analysis

To conduct the proposed architecture of the overall requirements include,

• High-performance computing device.

• Image input device.

• Open-source software libraries for scientific computations.

• Open-source software libraries to implement the deep learning model.

3.4 Research Methodology

In this section, the methodology of the proposed architecture is elaborated.

This section is sub-sectioned into four segments. The sub-sections are sorted

from the input to output phase of the model consecutively with detailed

explanations. Moreover, Figure 3.1 presents the overall workflow of the

architecture.

Input
Abnormality classifying
Processing the Input to classify
dataset CNN+RNN abnormalities

Figure 3.1. The figure illustrates the workflow of the proposed system (from

left to right).

18
3.4.1 Data Pre-processing

The data pre-processing has occurred in two stages, data normalization and

data augmentation. These two techniques are described below.

Data Normalization: Normalizing image is a significant pre-processing

methodology. It reduces the inner-class feature discrepancy and viewed

as intensity offsets. The intensity offsets are fixed in the local region.

So standard deviation and gaussian normalization are useful while

normalizing. The resulted image after normalization is computed by

Equation (3.1) [19].

ξ (π, θ) − µ (π, θ)
ψ (π, θ) = (3.1)
6σ (π, θ)

Where µ is a local mean and σ is a local standard deviation [42].


α α
1 X X
µ (π, θ) = 2 ε (K + µ, n + θ) (3.2)
M k=−α n=−α
v
u α α
u 1 X X
µ (π, θ) = t [ε (K + µ, n + θ) − µ (π, θ)]2 (3.3)
M 2 k=−α n=−α

Data Augmentation: We carry out various transformations to the train-

ing and testing samples during evaluation to increase the network

resistance mutation in input samples. This image mutation technique

is performed on the CPU simultaneously with network evaluation and

training on GPU. Deep learning architectures constantly demands a

vast number of input samples to achieve better accuracy. Even though

our NAHFE dataset has 1936 images for 4 classes, it is still inadequate

for evaluating a deep learning architecture. So before evaluating the

architecture, we augmented the dataset with several transformation

19
techniques for propagating diverse tiny variations in appearances and

poses. We engaged five image appearance filters (Gaussian, disk, un-

sharp, average, and motion) and six affines transform matrices by

joining short geometric transformations to the identity matrix. We

have created (5x6) = 30 different images by generating this augmenta-

tion for every actual image in the dataset. Therefore the number of

samples increased to (1936x30) = 58,080. Then we normalized all the

images using the approaches mentioned above. Finally, the dataset

is divided into two portions train and validation to experiment and

evaluate the model. 80% of the dataset is used to train the model,

and 20% is used for validation.

3.4.2 Convolutional Neural Network

Facial expression images appear in several shapes and qualities, so we

define the data pre-processing technique that can work with any type of

input shape and quality. In this architecture, CNN constructs with six

convolutional layers and two dense layers, each with a ReLu activation

function, and dropout for training. Equation (3.4) and Equation (3.5)

Explain the convolution and fully connected layers operations. Moreover, we

applied regularization for every weight matrix that shortens the volume of

the weights at the separate layer to several fixed hyperparameters. Equation

(3.5) Explains the regularization process.

(l−1)
mi
(l)
X (l) (l−1)
Yi (l) = Bi + Ki,j Yj (3.4)
j=1

(l) (l)
where the output Yi of layer l consists of the m3 feature of size

20
(l) (l) (l) (l)
m1 × m2 . The ith feature map denoted Yi and Bi is a big bias matrix
(l)
and Ki,j is the filter of size.

d (x) = Activation W T x + b

(3.5)

where, W = [W1 , W2 , ....Wn ]T represents the weight vector of the dense

layer, and b represents the bias value of the dense layer.

ReLU = max(0, x) (3.6)

if the ReLU function gets any non-positive value, it returns zero but for

any positive input of x, it returns that input value.


x,

with prob.p
Dropout(x, p) = (3.7)
x, with prob.1 − p

where x be the output of a particular neuron in the network and p the

dropout possibility.

3.4.3 Recurrent Neural Network

Recurrent neural networks (RNNs) are a branch of neural networks typically

used to process time-series and other sequential data. In RNN, the tensors

transit both forward and backward by circulating loops in the network. It

generates an output at each time step and has recursive connections between

hidden units. The mathematical model of RNNs can be expressed as follows:

ht = σh (Wh xt + Uh yt−1 + bh ) yt = σy (Wy ht + by ) (3.8)

21
Number of
kernels

CONV
LAYER Next Layer
(CL) 1 CL 2
CL 6

Feature
Input size
FC LAYER 1 FC LAYER 2

Figure 3.2. The convolutional neural network architecture is the feature

learning and extracting portions of the overall CNN-RNN combined architec-

ture. Each of the cubes represents an output of the convolution. The height

and width are the gained information, and each cube’s depth is equal to the

number of kernels. Each convolution is followed by batch normalization and

an activation layer. After the final convolution, it is converted into a linear

set of nodes.

yt = σy (Wy ht + by ) (3.9)

Where xt is an input vector, ht is a hidden layer vector, yt is the output

vector, Wh , Uh and bh , by are weighting matrices and vectors, and σh and

σy are activating vector functions.

3.4.4 Combined CNN-RNN Architecture

The proposed architecture combines continuous data using RNN to expand

and learn the information. To regulate all the parameters, the CNN feature

extraction method is used. The RNN classifies the images by adding the

extracted features from the successive CNN network of each image, and

finally, the prediction uses Softmax. While experimenting, when the image

is served to the CNN network, 200-dimensional vectors will be uprooted

from the dense layer. For the learned time t, the network takes P frames

22
from the past ([t − P, t]). After that, every frame runs from time t − P to

t the CNN and extracts P vectors for every input. After that, each vector

passes by a node of RNN, and each node of that model gives some outputs

of the valence label. The experiment and evaluation of the architecture are

done by various layers of CNN as input features, and the proposed one has

acquired the maximum score on test data. To calculate the cost function,

the mean squared error is used while optimizing. The overall architecture is

illustrated in Figure 3.3.

Input t-p CNN RNN

Input t-1 CNN RNN

Input t CNN RNN

Output

Figure 3.3. The combined CNN-RNN architecture to classify human abnor-

malities using the NAHFE dataset. CNN extracts the information and the

RNN classifies the images. The input images first pass to CNN and then the

output from the dense layer of CNN goes through the RNN and classifies

the exact class of the image.

23
3.5 Design, Implementation, and Simulation

The overall workflow of the proposed architecture is illustrated in Figure

3.1. All the mentioned steps of the prototype are implemented using Python

[43]. The convolutional and recurrent neural network models are imple-

mented using Keras. Also, for additional calculation, implementation, and

support, Numpy [44] is used. The dataset used to test the architecture is

directly inserted, and no variations or selections were made while testing

the architecture.

3.6 Summary

This section explains the architecture of the proposed human abnormal-

ities classification method. The overall architecture uses the combined

convolutional neural network and recurrent neural network approach.

24
Implementation, Testing, and Result Analysis

4.1 Introduction

This section explains the architecture of the proposed human abnormal-

ities classification method. The overall architecture uses the combined

convolutional neural network and a recurrent neural network approach.

4.2 Dataset

We observe that most of the Facial Expression Recognition (FER) image

datasets on the web are built for classifying six or seven basic expressions of

the human face. But, to analyse human stability, we need a dataset that

is divided into four classes named Drag addiction, Autism, Criminalism,

and Normal. Therefore we have used web gathering approaches to get

Normal and Abnormal human images from the web and create our Normal

and Abnormal Humans Facial Expression (NAHFE) dataset. As we have

narrated, we add four classes to our dataset: Drag addiction, Autism,

Criminalism, and Normal. Using respecting keywords to each of the four

classes in addition to the name of the class (e.g., sinful, convicted, sinner

25
Figure 4.1. The sample images of Normal, Autistic, Drug addict, and

Criminal human face from the dataset we used in this research from top to

the right. These images are gathered from the web using the web gathering

technique.

for Criminal), we have gathered a massive number of images that belong

to the same class. Finally, we placed 1936 images for the four classes. In

evaluations, we have used 80% (1548) of the images for training and the rest

20% (388) for testing. Also, images of four classes are distributed evenly,

and the number of samples in each class is 484. A couple of sample images

from the NAHFE dataset are shown in Figure 4.1.

26
4.3 System Setup

Python programming language is used for pre-processing data, experiment-

ing, and evaluations of the model. The proposed architecture is implemented

using TensorFlow and Keras. Besides, NumPy is used for mathematical

operations on the architecture.

4.4 Evaluation

We conduct experimental research on evaluating the proposed human abnor-

malities classification problems solution on our NAHFE dataset. This study

carried out the impact of the combined CNN-RNN approach to classify

human abnormalities.

Relative and sharable performance measures are required to estimate

how superior an algorithm or approach is. The major problem for evaluating

any method is adopting training and testing sets, which can introduce an

inconsistency in model performance. Most of the performance metrics are

based upon the confusion matrix, which consists of true-positive (TP), true-

negative (TN), false-positive (FP), and false-negative (FN) [45] values. The

significance of these elements can vary on how the performance evaluation

is done.

The accuracy of an identification system can be defined by how many

correct guesses the model estimates from the model’s total estimations. The

accuracy is measured as,

27
TP + TN
Accuracy = (4.1)
TP + TN + FP + FN

Precision defines all the positive classes the model predicted correctly,

how many are actually positive. To obtain the value of precision, we divide

the total number of correctly classified positive examples by the total number

of predicted positive examples. The equation can be stated as,

TP
P recision = (4.2)
TP + FP

Recall defines how much the model predicted correctly among all positive

classes. A recall is the ratio of the total number of correctly classified positive

examples divided by the total number of positive examples. The equation

can be stated as,

TP
Recall = (4.3)
TP + FN

4.5 Results and Discussion

We use the NAHFE dataset to implement our combined CNN-RNN ar-

chitecture. Table 4.1 shows the prediction accuracy, precision, and recall

of the combined CNN-RNN approach for classifying human abnormalities.

This table demonstrates the performance of the proposed architecture and

compared the predicted result of the proposed architecture with basic CNN

architecture. Our investigation found that using only CNN architecture

does not classify the human abnormalities properly. Basic CNN architecture

gives only 0.732 accuracies while CNN-RNN combined approach gives 0.895.

28
It is noted that the performance measurement by precision and recall also

higher for CNN-RNN combined approach comparing to the basic CNN

architecture. So, to get better results combining CNN with RNN found

more effective. Every result in this research is given as a mean of four runs.

The proposed model is trained with a limit of 100 epochs. In addition,

Model Accuracy Precision Recall


CNN 73.20 72.97 70.38
CNN and RNN 89.50 87.98 88.13

Table 4.1. The table shows the validation Accuracy, Precision, and Recall of

the proposed CNN-RNN combined approach and basic CNN architecture.

the proposed CNN-RNN combined architecture evaluated under different

hyperparameters to significantly tuning the model and investigated the

improvement of the model in different circumstances. The final result of the

proposed CNN-RNN combined approach comes out from investigations of

using the different number of hidden units and hidden layers.

No. of hidden units Accuracy Precision Recall


50 85.24 84.97 87.08
100 86.54 86.67 87.20
150 88.75 87.03 87.98
200 87.70 86.60 87.10

Table 4.2. The table shows the validation Accuracy, Precision, and Recall

of the proposed CNN-RNN combined approach based on different hidden

units.

Table 4.2 presented the result of the proposed model using a different

number of hidden units. The investigation says that the model performs the

best result while using 150 hidden units. The accuracy, precision, and recall

29
scores increase with the increasing number of hidden units. However, when

the number of hidden units cross 150, the model start behaving negative

result.

No. of hidden layers Accuracy Precision Recall


3 87.03 86.89 86.08
4 87.64 87.06 86.97
5 88.15 87.21 87.10
6 89.50 87.98 88.13
7 89.37 86.14 88.01

Table 4.3. The table shows the validation Accuracy, Precision, and Recall

of the proposed CNN-RNN combined approach based on different hidden

layers.

Similarly, Table 4.3 shows the result of using different hidden layers in

the model and it is found that using 6 hidden layers gives the best result of

classification. Increasing the hidden layers to 7 decreases the result of the

architecture. Finally, the experiment finds that the proposed classification

model performs best while using 150 hidden units and 6 hidden layers.

This paper broadly investigated the significance of the CNN-RNN com-

bined approach to classify human abnormalities and founds satisfactory

performance with 150 hidden units and 6 hidden layers.

4.6 Summary

From the evaluation analysis, it is proved that this architecture performs

the most satisfying on human abnormalities classification.

30
Standards, Constraints, Milestones

This section demonstrates the Standards, Impacts, Ethics, and Challenges

of the thesis work. Then, the Constraints and Alternatives are illustrated.

Finally, the Schedules, Tasks, and Milestones of the proposed work are

presented.

5.1 Standards (Sustainability)

We ensure that our thesis work will be sustainable for many years. Facial

expression analysis is a recent popular research topic. Human abnormalities

detection is a FER problem that can be helpful for society and the country.

Moreover, CNN and RNN that we used for the implementation is the current

edge deep learning approach. As our used resources will be available for more

extended periods of time, we can say this thesis work will be sustainable.

5.2 Impacts (on Society)

In society, there are different types of people around us. Different people

have different behaviours with a gesture. Nevertheless, some people show

31
different gestures due to their health abnormalities. As the facial expression

is different from an autistic than a normal man, a drug-addict man’s eyes

are different from a non-drug addict. We can group people by their facial

gestures. Yet, sometimes it is hard to understand facial expression by human

eyes as people can manipulate their facial expression sometimes.

In our society, 2 of 1000 children is having autism. For an autistic child,

we have a special school or daycare. But in-home every parent are not known

about autism. For them, it is hard to find that if their child is average or

not. The police can not identify criminal or drug addict by their eye every

time, as criminal get training to manipulate their facial expression. With

the help of our research work, they can catch criminal or drug-addicted

people or suppliers, setting a CC-Camera. If a terrorist wants to clutter in

an assembly, police or security can catch it by facial expression analysis. By

recognising that people, our society can be cleaner and more aware.

5.3 Ethics

The human abnormalities detection system has a vast application area,

recline on the dataset applied to train the model. The installation of the

systems must sustain individuals’ privacy concerns and should not be applied

for any motive that enhances a social, national or global security threat.

The collection of the dataset must be complete under the code of moral

principles and ethics.

32
5.4 Challenges

Although facial expression exploration architectures are amplifying rapidly,

industries producing such technologies still present information security

challenges. This system is mainly used for grouping people by detecting

user facial expression. Sometimes people may behave unusually, and people

may use a facemask or covering face, which can create a problem to detect

a person. Besides, Misuse is a big problem. Awful people can roughly use

this system.

5.5 Constraints

Different constraints such as design constraints, component constraints, and

budget constraints are presented in this section.

The overall structure is proposed based on the image dataset. To process,

a large number of images need a high processor to perform our modal in a

good way. However, no GPU is required.

This component is used to train our model:

• Minimum processor: Intel Core i3 (8th gen)

• Minimum memory: 4GB (DDR4, 2400bus)

• Video Input: HD Video Input Device

However, budget can vary in the market environment because the product

component’s price is not consistent.

33
5.6 Timeline and Gantt Chart

Our thesis work timeline is divided into three divisions as we get three

semesters to complete our work. We have conducted our work through the

execution procedure of our supervisor. In the first semester, we submitted

a proposal and reviewed the related work of the thesis work. Also, we

built the prototype of the proposed systems by analysing and planning

with the existing systems. In the second semester, we created a dataset

and implemented the model partially. Finally, in the third semester, we

have implemented the overall architecture and test with the introduced

dataset and reported the overall workflow. In the meantime, we also wrote

a conference paper which has been accepted.

The following Gantt Chart(Figure 5.1) represents the work execution

process to complete this thesis work. The thesis work is completed within

three semesters, where per semester is four month that means 12 month in

total.

34
1st Semester

Weeks 1 2 3 4 5 6 7 8 9 10 11 12

Planning and Study

Topic Selection

Review Related work

Analysing Existing Systems

Built Prototype

Evolution

2nd Semester

Weeks 13 14 15 16 17 18 19 20 21 22 23 24

Model Diagram

Model & System Design

Design Submission

Model Analysis

Partial Implementation

Evolution

3rd Semester

Weeks 25 26 27 28 29 30 31 32 33 34 35 36

Complete Implementation & Testing

Check Issue & Resolved

Model Finalization

Result Evaluation

Report Writing

Presentation & Final Evaluation

Figure 5.1. Gantt chart of the work execution process.


5.7 Summary

However, this chapter briefly explains the standards, impacts, ethics, chal-

lenges of the thesis work. Also, the constraints, alternatives, schedules, tasks,

and milestones of the proposed work are demonstrated.

36
Conclusion

6.1 Introduction

This paper experiments and evaluates a human abnormalities classification

method using the NAHFE dataset created by us. We practised a combined

method of CNN and RNN to train and test our method precisely. We

observe that the combined CNN-RNN approach gives better performance

for human abnormalities classification. To the best of our knowledge, the

architecture proposed in this paper is the first architecture that classifies

human abnormalities using a novel CNN-RNN combined approach. The

performance of the proposed architecture is also compared with different

CNN baseline architectures.

6.2 Future Works and Limitations

Our thesis is the first research work on human abnormalities detection

from still images to the best of our knowledge. Human abnormalities

detection from video and audio using deep learning architectures can be

a significant research field. Besides, we have used CNN-RNN to develop

37
the architecture, but the latest architecture like GRU, LSTM can be tested

for the task. We strongly believe that ‘Human Abnormality Classification

Using Combined CNN-RNN Approach ’ is a research work that will pave

the way for significant research on human abnormalities classification and

will enhance the intelligence and practicability in future work.

38
References

[1] Yadan Lv, Zhiyong Feng, and Chao Xu. Facial expression recognition

via deep learning. pages 303–308, 2014.

[2] Inchul Song, Hyun-Jun Kim, and Paul Barom Jeon. Deep learning for

real-time robust facial expression recognition on a smartphone. pages

564–567, 2014.

[3] Steve Lawrence, C Lee Giles, Ah Chung Tsoi, and Andrew D Back.

Face recognition: A convolutional neural-network approach. IEEE

transactions on neural networks, 8(1):98–113, 1997.

[4] Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and

Sanjeev Khudanpur. Recurrent neural network based language model.

In Eleventh annual conference of the international speech communication

association, 2010.

[5] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.

nature, 521(7553):436–444, 2015.

[6] Heechul Jung, Sihaeng Lee, Junho Yim, Sunjeong Park, and Junmo

Kim. Joint fine-tuning in deep neural networks for facial expression

recognition. pages 2983–2991, 2015.

39
[7] Ali Mollahosseini, David Chan, and Mohammad H Mahoor. Going

deeper in facial expression recognition using deep neural networks. pages

1–10, 2016.

[8] Masakazu Matsugu, Katsuhiko Mori, Yusuke Mitari, and Yuji Kaneda.

Subject independent facial expression recognition with robust face

detection using a convolutional neural network. Neural Networks, 16(5-

6):555–559, 2003.

[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet

classification with deep convolutional neural networks. Communications

of the ACM, 60(6):84–90, 2017.

[10] Karen Simonyan and Andrew Zisserman. Very deep convolutional net-

works for large-scale image recognition. arXiv preprint arXiv:1409.1556,

2014.

[11] Patrice Y Simard, David Steinkraus, John C Platt, et al. Best practices

for convolutional neural networks applied to visual document analysis.

3(2003), 2003.

[12] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,

and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural

networks from overfitting. The journal of machine learning research,

15(1):1929–1958, 2014.

[13] Hiroshi Kobayashi and Fumio Hara. Dynamic recognition of basic facial

expressions by discrete-time recurrent neural network. 1:155–158, 1993.

[14] Abir Fathallah, Lotfi Abdi, and Ali Douik. Facial expression recognition

via deep learning. pages 745–750, 2017.

40
[15] Anima Majumder, Laxmidhar Behera, and Venkatesh K Subramanian.

Automatic facial expression recognition system using deep network-

based data fusion. IEEE transactions on cybernetics, 48(1):103–114,

2016.

[16] Madhumita A Takalkar and Min Xu. Image based facial micro-

expression recognition using deep learning on small datasets. pages 1–7,

2017.

[17] Heechul Jung, Sihaeng Lee, Sunjeong Park, Byungju Kim, Junmo Kim,

Injae Lee, and Chunghyun Ahn. Development of deep learning-based

facial expression recognition system. pages 1–4, 2015.

[18] Xiujie Qu, Tianbo Wei, Cheng Peng, and Peng Du. A fast face recogni-

tion system based on deep learning. 1:289–292, 2018.

[19] Neha Jain, Shishir Kumar, Amit Kumar, Pourya Shamsolmoali, and

Masoumeh Zareapoor. Hybrid deep neural networks for face emotion

recognition. Pattern Recognition Letters, 115:101–106, 2018.

[20] Osamah M Al-Omair and Shihong Huang. A comparative study on

detection accuracy of cloud-based emotion recognition services. pages

142–148, 2018.

[21] Savath Saypadith and Supavadee Aramvith. Real-time multiple face

recognition using deep learning on embedded gpu system. pages 1318–

1324, 2018.

[22] Hamid Ouanan, Mohammed Ouanan, and Brahim Aksasse. Face

recognition using deep features. pages 78–85, 2017.

41
[23] Dinh Viet Sang, Nguyen Van Dat, et al. Facial expression recognition

using deep convolutional neural networks. pages 130–135, 2017.

[24] André Teixeira Lopes, Edilson de Aguiar, Alberto F De Souza, and

Thiago Oliveira-Santos. Facial expression recognition with convolutional

neural networks: coping with few data and the training sample order.

Pattern Recognition, 61:610–628, 2017.

[25] Deepak Kumar Jain, Pourya Shamsolmoali, and Paramjit Sehdev. Ex-

tended deep neural network for facial emotion recognition. Pattern

Recognition Letters, 120:69–74, 2019.

[26] Jinwoo Jeon, Jun-Cheol Park, YoungJoo Jo, ChangMo Nam, Kyung-

Hoon Bae, Youngkyoo Hwang, and Dae-Shik Kim. A real-time facial

expression recognizer using deep neural network. pages 1–4, 2016.

[27] Hiranmayi Ranganathan, Shayok Chakraborty, and Sethuraman Pan-

chanathan. Multimodal emotion recognition using deep learning archi-

tectures. pages 1–9, 2016.

[28] Xudong Sun, Pengcheng Wu, and Steven CH Hoi. Face detection using

deep learning: An improved faster rcnn approach. Neurocomputing,

299:42–50, 2018.

[29] Hong-Wei Ng, Viet Dung Nguyen, Vassilios Vonikakis, and Stefan

Winkler. Deep learning for emotion recognition on small datasets using

transfer learning. pages 443–449, 2015.

[30] Wael AbdAlmageed, Yue Wu, Stephen Rawls, Shai Harel, Tal Hassner,

Iacopo Masi, Jongmoo Choi, Jatuporn Lekust, Jungyeon Kim, Prem

42
Natarajan, et al. Face recognition using deep multi-pose representations.

pages 1–9, 2016.

[31] Yanan Guo, Dapeng Tao, Jun Yu, Hao Xiong, Yaotang Li, and Dacheng

Tao. Deep neural networks with relativity learning for facial expression

recognition. pages 1–6, 2016.

[32] Tolga Soyata, Rajani Muraleedharan, Colin Funai, Minseok Kwon, and

Wendi Heinzelman. Cloud-vision: Real-time face recognition using a

mobile-cloudlet-cloud acceleration architecture. pages 000059–000066,

2012.

[33] K Shailaja and B Anuradha. Effective face recognition using deep

learning based linear discriminant classification. pages 1–6, 2016.

[34] Chathura R De Silva, Surendra Ranganath, and Liyanage C De Silva.

Cloud basis function neural network: a modified rbf network architecture

for holistic facial expression recognition. Pattern recognition, 41(4):1241–

1253, 2008.

[35] M Shamim Hossain and Ghulam Muhammad. Emotion recognition

using secure edge and cloud computing. Information Sciences, 504:589–

601, 2019.

[36] Nianyin Zeng, Hong Zhang, Baoye Song, Weibo Liu, Yurong Li, and

Abdullah M Dobaie. Facial expression recognition via learning deep

sparse autoencoders. Neurocomputing, 273:643–649, 2018.

[37] Samuli Laine, Tero Karras, Timo Aila, Antti Herva, Shunsuke Saito,

Ronald Yu, Hao Li, and Jaakko Lehtinen. Production-level facial

43
performance capture using deep convolutional neural networks. arXiv

preprint arXiv:1609.06536, 2016.

[38] Md Zia Uddin, Weria Khaksar, and Jim Torresen. Facial expression

recognition using salient features and convolutional neural network.

IEEE Access, 5:26146–26161, 2017.

[39] Panagiotis Tzirakis, George Trigeorgis, Mihalis A Nicolaou, Björn W

Schuller, and Stefanos Zafeiriou. End-to-end multimodal emotion recog-

nition using deep neural networks. IEEE Journal of Selected Topics in

Signal Processing, 11(8):1301–1309, 2017.

[40] Kun Tian, Liaoyuan Zeng, Sean McGrath, Qian Yin, and Wenyi Wang.

3d facial expression recognition using deep feature fusion cnn. pages

1–6, 2019.

[41] Wei Li, Min Li, Zhong Su, and Zhigang Zhu. A deep-learning approach

to facial expression recognition with candid images. pages 279–282,

2015.

[42] Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, and Yang Li.

Spatial–temporal recurrent neural network for emotion recognition.

IEEE transactions on cybernetics, 49(3):839–847, 2018.

[43] Guido Van Rossum et al. Python, 1991.

[44] Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. The numpy

array: a structure for efficient numerical computation. Computing in

science & engineering, 13(2):22–30, 2011.

[45] Jake Lever, Martin Krzywinski, and Naomi Altman. Points of signifi-

cance: model selection and overfitting, 2016.

44

You might also like