Dhaka University of Engineering & Technology (DUET), Gazipur
Department of Computer Science and Engineering
Thesis proposal report
Title: Detecting Emotions using facial Expression Analysis: A Machine Learning Approach
Prepared By
Palash Chandra Paul Nahidur Rahman Md.Shawon Miah
Std_Id: 204011 Std_Id: 204010 Std_Id: 204033
Semester: 4/2 Semester: 4/2 Semester: 4/2
Supervised By
Md.Abu Bakkar Siddique
Assistant Professor
Department of Computer Science and Engineering
1.Introduction
Facial emotion is a fundamental aspect of nonverbal communication. It plays a critical role in expressing
human feelings, perceptions, behavioral reactions, intentions, social signals, criminal tendencies, lies, and
a degree of gratification or displeasure. Facial expressions often provide an immediate and involuntary
reflection of a person’s emotional state. Regardless of gender, nationality, culture, and race, most people
can recognize facial emotions easily. The general approach to automatic facial expression analysis
consists of three steps: face detection and tracking, feature extraction, and expression classification and
recognition [4].
As technology becomes more embedded in our daily life, the ability for machines to comprehend and react
to human emotions is emerging as a significant focus within artificial intelligence (AI) and human-computer
interaction (HCI). With advancements in computer vision and machine learning, it has become possible to
analyze facial features automatically and classify emotional states with increasing accuracy. The field of
emotion detection through facial expression analysis has seen significant advancements in recent years,
primarily driven by progress in machine learning and computer vision.
Facial expressions can vary significantly based on factors such as age, gender, ethnicity, facial hair,
makeup, accessories, lighting conditions, and head orientation. When working with a limited dataset, these
variations make it challenging for machine learning models to accurately detect and interpret expressions
under such diverse conditions. Using a large and diverse dataset can significantly improve the accuracy
of facial expression detection. To achieve high performance, various machine learning and deep learning
algorithms can be applied, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks
(RNN), Support Vector Machines (SVM), AdaBoost, and Linear Discriminant Analysis (LDA).
Additionally, feature selection techniques like Principal Component Analysis (PCA) can be used to reduce
dimensionality and enhance model efficiency. By combining robust datasets with these advanced
algorithms and feature selection methods, facial expressions can be detected more accurately and reliably
[2]. However, a challenging task is automating facial emotion detection and classification [7]. This area of
research focuses on training machines using a dataset to automatically recognize human emotions by
analyzing facial features captured in images or video streams. Humans can easily detect emotions, but
sometimes they cannot detect them. In the robotics sector, robots cannot detect human emotions easily,
so this field is used to detect emotions using facial expression analysis. Facial emotion analysis can help
identify signs of emotional distress, depression, anxiety, or other psychological conditions. Machine
learning models can assist therapists by continuously monitoring patients’ emotional responses in remote
or in-person sessions, offering early warnings, or aiding in diagnostics [3].
This research paper presents a machine learning-based approach for detecting emotions through facial
expression analysis. The goal is to develop a hybrid model combining EfficientNetV2 and ConvNeXt
that not only achieves high classification accuracy but also generalizes well across different faces and
environments
2.Literature Review
Numerous studies have investigated diverse techniques for emotion detection, offering significant
contributions to the advancement of this field. The following section offers a brief examination of several
influential studies that have contributed to the progress of emotion recognition research.
In [6], the authors proposed a deep learning approach utilizing Convolutional Neural Networks (CNNs) for
emotion recognition from facial images. The model’s effectiveness was assessed using two well-
established datasets: the Facial Emotion Recognition Challenge (FER2013) and the Japanese Female
Facial Expression (JAFFE) dataset. It achieved notable accuracy scores of 70.14% and 98.65% on the
FER2013 and JAFFE datasets, respectively. However, their approach was designed to detect only the
seven basic emotions. It does not account for more nuanced or compound emotions (e.g., confusion,
frustration, sarcasm), which limits its applicability in more realistic, emotionally rich environments.
Pranav E. et al. [10] proposed a Deep Convolutional Neural Network (DCNN) model for recognizing five
facial emotions—angry, happy, neutral, sad, and surprised—achieving a test accuracy of 78.04%.The
model, trained on a manually collected dataset, utilizes standard CNN layers with ReLU and Softmax
activations, and is optimized using the Adam algorithm. While the approach shows promise and avoids
overfitting, it primarily relies on a simple two-layer architecture and lacks integration of more advanced
deep learning strategies or temporal dynamics—key elements that are increasingly vital for robust emotion
detection in real-world scenarios.
In [8], the authors developed a facial expression recognition system using traditional machine learning
techniques, specifically Haar-like features for face detection and HOG features with SVM for emotion
classification. While the system achieved a strong F1 score of 0.8759 and demonstrated effective emotion
detection across three expressions, its performance hinged on handcrafted features and personalized
classifiers—limitations that reduce scalability and adaptability to broader, more diverse datasets.
Singh et al. [11] proposed a facial emotion recognition system using CNN integrated with SVM for
classification, achieving up to 93% accuracy, though the model faced challenges with training time and
generalization across diverse datasets. Gaddam et al. [5] introduced a CNN-based model using ResNet50
architecture for classifying facial emotions from static images, achieving a test accuracy of 55.6%, but the
model struggled with data imbalance and limited accuracy compared to more advanced deep learning
frameworks.
Ali et al. [1] developed a CNN-based facial emotion detection system using Keras and real-world image
datasets, achieving up to 93% accuracy, but the study lacked integration of ensemble techniques and did
not address computational constraints for deployment on low-resource systems. Singh and Fang [2]
proposed a deep learning-based approach for emotion recognition using both audio and video modalities.
They evaluated multiple neural architectures, including CNN, CNN+RNN, and a hybrid
CNN+RNN+3DCNN model. Their best performance was achieved with the hybrid model, reaching 71.75%
accuracy on three emotions (sad, angry, and neutral). However, despite its enhanced learning capacity,
the hybrid model was noted for its high computational demand and only marginal improvement over simpler
CNN+RNN models—highlighting the trade-off between model complexity and practical efficiency in
emotion detection tasks.
In [9], Mellouka and Handouzia conducted a comprehensive review of deep learning models applied to
facial emotion recognition (FER), highlighting architectures like CNN, CNN-LSTM, 3DCNN, and BiLSTM.
These models achieved high recognition rates—many exceeding 90%—across benchmark datasets such
as CK+, JAFFE, and FER2013. While the results were promising, challenges persisted regarding
computational costs, data diversity, and the generalizability of models across varied realworld conditions,
limiting their practical scalability and interpretability in sensitive domains like healthcare.
Khan et al [7] conducted a comprehensive review of both traditional machine learning and deep learning
methods for facial emotion recognition, highlighting that CNN-based models deliver high accuracy on
standard datasets, while traditional ML methods like SVM and KNN remain computationally efficient.
However, the study lacked experimental implementation and did not address fairness or real-time
deployment challenges, limiting its practical applicability.
The reviewed work presents a deep learning-based facial emotion recognition framework using CNN,
LSTM, and transfer learning models like MobileNet and DCNN. While effective in feature extraction and
classification across standard datasets, the approaches still faced challenges with intra-class diversity and
struggled with real-world variability, as they primarily relied on pre-defined sentiment categories and lacked
real-time adaptability.
Comparison among different models & their dataset
Jaiswal : CNN and FER2013 and
100
JAFFE DataSet
90 Pranva : DCNN Model, ReLU and
80 Softmax Activation
Kim : Haar and HOG Feature With
70 SVM
60 Singh : CNN integrated With SVM
50 Gadda : CNN Base model ResNet50
40 Architure
Ali : CNN and Keras Real world
30 Image
20 Singh : CNN, CNN+RNN,and
CNN+RNN+3DD
10 Mellouka : FER Dataset,CNN CNN-
0 LSTM, BiLSTM
Yadav : CNN model and FER-2013
Current machine learning models like CNN, RNN, SVM, LDA, Random Forest, and architectures such as
MobileNet, DExpression, DenseNet, and ResNet are widely used for facial emotion recognition.
However, they often face limitations in accuracy and generalization. Developing a hybrid model can help
overcome these issues and improve overall performance.
3.Objectives
The primary objectives of our research is as follows:
➢ To analyze existing machine learning models for facial emotion recognition to identify strengths,
limitations, and gaps.
➢ To select and preprocess relevant facial expression datasets to ensure quality, balance, and
suitability for training.
➢ To design and implement an effective hybrid machine learning model to detect emotions from facial
expressions with performance evaluation by combining EfficientNetV2 and ConvNeXt.
4.Methodology
This work consider the leading challenge faced by machine learning and the entire system is the training
part. Where the system has to train by using real data of human face reactions. For a facial emotion
recognition system to be effective, it must be trained with real-world data showing different human facial
reactions. For instance, if the system is expected to detect a happy or angry face, it must first learn how
these emotions look. This learning is done through a process called re-training, where the models are
repeatedly fed real examples until they begin to recognize patterns and emotional features with high
accuracy
Image Collection Feature Selection and
Image Processing
Transformation
by using a hybrid model
Classification/ Feature extraction
Recognition
Fig: Block Diagram of Image processing for emotion detection
Creating a hybrid model combining EfficientNetV2 and ConvNeXt is a powerful approach that can
leverage the efficiency of EfficientNetV2 and the feature richness of ConvNeXt.
Fig: Network Structure of ConvNeXt
ConvNeXt Stages (Stage 1 to Stage 4)
Input Layer(Conv2D+LayerNorm): Facial input images passes through a 2D convolutional layer to
extract low level feature, followed by Layer Normalization of stabilize and Accelerate Training.
❖ Stage 1: Initial feature extraction using ConvNeXt blocks. Followed by downsampling to reduce
image size and increase abstraction.
❖ Stage 2: Builds on stage 1’s features, capturing more complex structures (e.g., edge patterns,
basic facial parts). Downsampling continues for deeper representation.
❖ Stage 3: Further extracts semantic features (e.g., eyes, mouth shapes, etc.). Ends with another
downsampling operation.
❖ Stage 4: Deepest features extracted — focuses on subtle emotion cues. Ends with Global
Average Pooling, compressing feature maps to a compact representation.
Output Layer (LayerNorm + Linear Layer): Normalizes and passes final features to a fully connected
layer. Predicts one of the emotional categories (e.g., anger, disgust, fear, happy, neutral, sad, surprise).
Fig: EfficientNetV2 for Emotion Detection Fig: Attention mechanism SENet
Function:
➢ Input LayerInput Size: Typically 224×224×3 (RGB image). Takes in preprocessed face images.
➢ The stem that process the input image before deeper extraction begins. In emotion detection, the
stem plays a critical role in preparing the raw facial image data for meaningful feature learning.
➢ EfficientNetV2 BackboneActs as a feature extractor. It progressively reduces spatial resolution
while increasing channel depth, extracting complex hierarchical features.
➢ Global Average Pooling (GAP)Reduces feature maps to a 1D vector, maintaining essential
features.
➢ Dropout is a regularization technique used in deep learning to prevent overfitting by randomly
“dropping out” (i.e., setting to zero) a fraction of the neurons during training.
➢ Fully Connected (Dense) LayerOutputs a vector with seven softmax probabilities corresponding
to:[Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral].
➢ Output LayerActivation: SoftmaxOutput: Class probabilities for the emotion classes.
5.Proposed Research Plan
➢ To develop a new hybrid architecture combining EfficientNetV2, ConvNeXt blocks, and SENet.
➢ To develop a new own dataset for Higher accuracy.
➢ To compare performance of CNN, MobileNet, ResNet, DenseNet, and Dexpression.
➢ Real-time compatibility with webcam or mobile deployment.
Month Research Existing Research Data Collection Methodology Result Test
Background method Objective And Analysis
analysis
Sep – Nov ✓ X X X X X
Dec – Jan X ✓ X X X X
Feb X X ✓ X X X
Mar – May X X X ✓ X X
Jun – July X X X X ✓ X
Aug X X X X X ✓
6.References
1. Md Forhad Ali, Mehenag Khatun, and Nakib Aman Turzo. Facial emotion detection using neural
network. the international journal of scientific and engineering research, 2020.
2. Tanveer Aslam, Salman Qadri, Muhammad Shehzad, S Furqan Qadri, Abdul Razzaq, and Syed
Shah. Emotion based facial expression detection using machine learning. Life Science Journal,
17(8):35–43, 2020.
3. Carmen Bisogni, Aniello Castiglione, Sanoar Hossain, Fabio Nar ducci, and Saiyed Umer. Impact
of deep learning approaches on facial expression recognition in healthcare industries. IEEE
Transactions on Industrial Informatics, 18(8):5619–5627, 2022.
4. Renuka S. Deshmukh, Vandana Jagtap, and Shilpa Paygude. Facial emotion recognition system
through machine learning approach. In 2017 International Conference on Intelligent Computing and
Control Systems (ICICCS), pages 272–277, 2017.
5. Dharma Karan Reddy Gaddam, Mohd Dilshad Ansari, Sandeep Vuppala, Vinit Kumar Gunjan, and
Madan Mohan Sati. Human facial emotion detection using deep learning. In ICDSMLA 2020:
Proceed ings of the 2nd International Conference on Data Science, Machine Learning and
Applications, pages 1417–1427. Springer, 2022.
6. Akriti Jaiswal, A Krishnama Raju, and Suman Deb. Facial emotion detection using deep learning.
In 2020 international conference for emerging technology (INCET), pages 1–5. IEEE, 2020.
7. Amjad Rehman Khan. Facial emotion recognition using conventional machine learning and deep
learning methods: current achievements,analysis and remaining challenges. Information,
13(6):268, 2022.
8. Sanghyuk Kim, Gwon Hwan An, and Suk-Ju Kang. Facial expression recognition system using
machine learning. In 2017 international SoC design conference (ISOCC), pages 266–267. IEEE,
2017.
9. Wafa Mellouk and Wahida Handouzi. Facial emotion recognition using deep learning: review and
insights. Procedia Computer Science,175:689–694, 2020
10. Pranav, Suraj Kamal, C Satheesh Chandran, and MH Supriya.Facial emotion recognition using
deep convolutional neural network.In 2020 6th International conference on advanced computing
and communication Systems (ICACCS), pages 317–320. IEEE, 2020.
11. Shubham Kumar Singh, Revant Kumar Thakur, Satish Kumar, and Rohit Anand. Deep learning
and machine learning based facial emotion detection using cnn. In 2022 9th International
Conference on Computing for Sustainable Global Development (INDIACom), pages 530–535.
IEEE, 2022.
12. Yatharth Yadav, Vikas Kumar, Vipin Ranga, and Ram Murti Rawat.Analysis of facial sentiments:
a deep-learning way. In 2020 International Conference on Electronics and Sustainable
Communication Systems (ICESC), pages 541–545. IEEE, 2020.
13. Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu.Geometry guided pose-invariant
facial expression recognition. IEEE Transactions on Image Processing, 29:4445–4460, 2020.