Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
190 views25 pages

Project Repoprt Final-Speech Emotion Recognition

This document discusses a project on speech emotion recognition using machine learning techniques. It aims to build models that can detect emotions from speech signals by training on comprehensive datasets. The performance of models is evaluated based on metrics like accuracy. The project also examines how factors like language, gender and culture impact accuracy of emotion recognition systems.

Uploaded by

Vidya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views25 pages

Project Repoprt Final-Speech Emotion Recognition

This document discusses a project on speech emotion recognition using machine learning techniques. It aims to build models that can detect emotions from speech signals by training on comprehensive datasets. The performance of models is evaluated based on metrics like accuracy. The project also examines how factors like language, gender and culture impact accuracy of emotion recognition systems.

Uploaded by

Vidya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Visvesvaraya Technological University, Belagavi –590014

PROJECT REPORT ON

“SPEECH EMOTION RECOGNITION”


Submitted in partial fulfilment of the requirements for the 8th semester VTU CBCS Subject namely

MAJOR PROJECT

COMPUTER SCIENCE AND ENGINEERING


For the Academic year
2023 - 2024

Submitted By

ABIN K SHAJI 4SH20CS003


AKHIL ASOKAN 4SH20CS005
NASHAL AHMAD 4SH20CS041
VIGNESH PRABHAKARAN 4SH20CS070

Under the Guidance of

Miss. KAVITHA
Department of CSE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SHREE DEVI INSTITUTE OF TECHNOLOGY, MANGALURU- 574 142
SHREE DEVI INSTITUTE OF TECHNOLOGY
(An Institution under VTU, Belagavi)
KENJAR, MANGALURU- 574 142
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

Certified that the project work entitled “SPEECH EMOTION RECOGNITION” is a bonafide
work carried out by ABIN K SHAJI, AKHIL ASOKAN, NASHAL AHMAD, VIGNESH
PRABHAKARAN bearing USN’s 4SH20CS003, 4SH20CS005, 4SH20CS041, 4SH20CS070
respectively in partial fulfilment for the VTU CBCS subject Major Project, and for the
award of degree of Bachelor of Engineering in Computer Science and Engineering of the
Visvesvaraya Technological University, Belagavi during the year 2023-2024. It is certified
that all corrections / suggestions indicated for Internal Assessment have been incorporated in
the report deposited in the departmental library. The project report has been approved as it
satisfies the academic requirements in respect of project work prescribed for the degree of
Bachelor of Engineering.

Signature of the Guide Signature of the HOD Signature of the Principal


Miss.Kavitha Prof. Anand S Uppar Dr. K E Prakash
Asst. Professor (CSE) HOD Dept. of CSE SDIT, Kenjar

EXTERNAL VIVA

Name of the Examiners Signature with Date

1.
2.
ACKNOWLEDGEMENT

A successful project is a fruitful culmination of the efforts of many people. Some


directly involved and others who have quietly encouraged and extended their invaluable
support throughout its progress.
We would also like to convey our heartfelt thanks to our Management for providing
us with the good infrastructure, laboratory facility, qualified and inspiring staff whose
guidance was of great help in successful completion of this project.
We are extremely grateful and thankful to our beloved Principal Dr. K E Prakash for
providing a congenial atmosphere and also the necessary facilities for achieving the cherished
goal.
We feel delighted to have this page to express my sincere thanks and deep appreciation
to Prof. Anand S. Uppar, Head of the Department, Computer Science and Engineering,
for his valuable guidance, keen interest and constant encouragement throughout the entire
period of this project work.
We would like to thank my project guides Miss. Kavitha of Computer Science
Department for their valuable guidance and constant support throughout the project work.
We are thankful to all the teaching and non-teaching staff for allowing me to successfully
carryout the project work.
Finally, we also thank our family and friends who provided lot of support in this project work.

Abin K Shaji

Akhil Asokan

Nashal Ahmad

Vignesh Prabhakaran
DECLARATION

We Abin K Shaji, Akhil Asokan, Nashal Ahmad, Vignesh Prabhakaran bearing USN’s
4SH20CS003,4SH20CS005, 4SH20CS041, 4SH20CS070 respectively, students of 8th
semester Bachelor of Engineering, Computer Science and Engineering, Shree Devi Institute of
Technology, Mangalore declare that the project work entitled “Speech Emotion Recognition”
has been duly executed by us under the guidance of Miss, Kavitha Asst. Professor,
Department of Computer Science and Engineering, Shree Devi Institute of Technology,
Mangalore and submitted for the requirements for the 8th semester Major Project of
Bachelor of Engineering in Computer Science Engineering during the year 2023-2024.

Date:
Place: Mangalore

ABIN K SHAJI 4SH20CS003


AKHIL ASOKAN 4SH20CS005
NASHAL AHMAD 4SH20CS041
VIGNESH PRABHAKARAN 4SH20CS070
ABSTRACT

Speech emotion recognition processing has emerged as a critical research area with
applications spanning human-computer interaction, affective computing, and mental health
diagnostics. This project investigates the feasibility and efficacy of utilizing machine learning
techniques to discern emotional states from speech signals. A comprehensive dataset
encompassing a diverse range of emotions is employed to train and evaluate the models,
ensuring robustness and generalizability. The performance of each model is assessed based on
metrics such as accuracy, precision and recall. The project explores the impact of various
factors such as language, gender, and cultural background on the accuracy of emotion
recognition systems. Insights gained from this analysis contribute to the development of more
inclusive and adaptable models. This project provides valuable insights into the state-of-the-
art techniques and challenges in emotion recognition through speech processing. The findings
lay the groundwork for future research endeavors aimed at enhancing the accuracy and
applicability of such systems in real-world scenarios
TABLE OF CONTENTS

CHAPTER DESCRIPTION PAGE NO


NO
1. INTRODUCTION 1-3

1.1 Introduction 1

1.2 Significance Of Study 2

1.3 Limitation of Study 3

2 OBJECTIVE AND PROBLEM STATEMENT 4

2.1 Objective 4

2.2 Problem Statement 4

3. LITERATURE REVIEW 5-6

4. SYSTEM REQUIREMENTS AND TECHNIQUE 7


USED
4.1 System Requirements 7

4.2 Machine Learning 7

5 METHODOLOGY 8

5.1 Audio Gathering 8

5.2 Audio Classification 8

5.3 Emotion Detection 9

5.4 Emotion Detection Experiment 9

5.5 Interpretation Of The Result 9

5.6 Use Case Diagram 10

6 RESULTS 11-15

7 CONCLUSION 16

REFERENCES 16
LIST OF FIGURES

FIGURE NO. FIGURE NAME PAGE NO.

5.1 Flow Chart 8

5.2 Use Case Diagram 10

6.1 Home Page 11

6.2 Login page 11


6.3 Registration page 12

6.4 Admin Login page 12

6.5 User page 13

6.6 Admin page 13

6.7 Admin Register page 14

6.8 Result 1 14

6.9 Result 2 15

6.10 Contact Us page 15


Speech Emotion Recognition 2022-23

CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION

The most elementary way of communication in humans is Speech. To enrich interaction, one
needs to know and understand the emotion of another person and how to react to it. Unlike
machines, we humans can naturally recognize the nature and emotion of the speech. Can a
machine also detect the emotion from a speech? Well this could be made possible using machine
learning. Machines need a specific model for detecting the emotions of a speech and such a model
can be implemented using machine learning.

Speech emotion recognition is a very useful and important topic in today's world. A
machine detecting the emotion of a human speech can be proved useful in various industries. A
very basic usage of speech recognition is in the health sector where it can be used in detecting
depression, anxiety, stress etc. in a patient. It can also be used in industries like the crime sector
where emotions can be recognized from the speech to distinguish between victims and criminals.

Emotions can be of various types like happy, sad, angry, disguised etc. depending on the
feeling and frame of mind of the person. In our study, we have used various datasets with different
emotions. We have also combined four datasets to one dataset and then applied to the model so
that the efficiency of the model can be improved and there can be a variety in the data points. This
has also resulted in eliminating the overfitting condition in our model.

Despite significant progress in the field, speech emotion recognition still poses several
challenges. The variability and subjectivity inherent in human emotions, as well as the influence of
factors such as language, culture, and individual differences, present obstacles to building robust
and generalizable emotion recognition systems. Additionally, the presence of speech-related
phenomena such as background noise, speaker variability, and emotional masking further
complicates the task of emotion recognition.

In light of these challenges, this project aims to investigate the feasibility and efficacy of
utilizing machine learning techniques for speech emotion recognitio

Dept. of CSE, SDIT 1


Speech Emotion Recognition 2022-23

1.2 SIGNIFICANCE OF STUDY

The study of speech emotion recognition holds significant importance in various


domains due to its potential impact on human-computer interaction, affective computing, mental
health diagnostics, and beyond. The following points highlight the significance of this research
endeavor:

1. Enhanced Human-Computer Interaction (HCI): Emotion recognition in speech can


revolutionize HCI by enabling systems to understand and respond to users' emotions. This
leads to more intuitive and personalized interactions, improving user satisfaction and
engagement with technology.
2. Advancement in Affective Computing: Speech emotion recognition contributes to the
development of emotionally intelligent systems capable of empathizing with users. Such
systems can adapt their responses and behaviors based on users' emotional states, leading to
more natural and empathetic interactions.
3. Improvement in Mental Health Diagnostics: Speech-based emotion recognition systems have
the potential to assist clinicians in diagnosing mental health disorders and monitoring patients'
emotional well-being. Early detection of emotional distress or changes in mood through
speech analysis can facilitate timely interventions and improve patient outcomes.
4. Development of Assistive Technologies: Emotion recognition technology can be integrated
into assistive devices to support individuals with special needs, such as those on the autism
spectrum or with communication disorders. These systems can help individuals express their
emotions more effectively and communicate with others more comfortably.
5. Advancement of Psychological Research: Speech emotion recognition enables researchers to
study human emotions on a large scale, providing insights into emotional expression,
perception, and communication. This can lead to advancements in fields such as psychology,
linguistics, and social sciences

Dept. of CSE, SDIT 2


Speech Emotion Recognition 2022-23

1.3 LIMITATION OF STUDY

1.Data Availability and Quality: One of the primary limitations of speech emotion recognition
studies is the availability and quality of annotated datasets. Limited access to diverse and well-
labeled datasets may restrict the generalizability of the findings and the robustness of the
developed models.

2. Subjectivity and Variability of Emotions: Emotions are inherently subjective and complex,
making their recognition from speech signals a challenging task. The variability in emotional
expression across individuals, cultures, and contexts introduces ambiguity and difficulty in
accurately categorizing emotions.

3.Speech Variability and Noise: Variability in speech characteristics, such as accent, pitch,
intonation, and speaking rate, can affect the performance of emotion recognition systems.
Additionally, the presence of background noise and environmental factors can further obscure
emotional cues in speech signals, leading to decreased accuracy.

4. Limited Emotional Range in Datasets: Many existing datasets for speech emotion recognition
focus on a limited range of basic emotions (e.g., happiness, sadness, anger), neglecting more
nuanced and complex emotional states. This limitation may restrict the applicability of the
developed models to real-world scenarios where emotions are multifaceted.

5.Speaker Dependency: Emotion recognition systems may exhibit bias or reduced accuracy when
confronted with speech from speakers who were not represented adequately in the training data.
Speaker-dependent models may struggle to generalize to new speakers or demographic groups,
limiting their practical utility.

CHAPTER 2

Dept. of CSE, SDIT 3


Speech Emotion Recognition 2022-23

OBJECTIVE AND PROBLEM STATEMENT

2.1 OBJECTIVE

The objective of this project is to develop a robust and accurate system for automatically
recognizing and classifying emotions from speech signals. The system will utilize machine learning
algorithms and signal processing techniques to analyze acoustic features of speech, such as pitch,
intensity, and spectral characteristics, in order to classify emotions into predefined categories such
as happiness, sadness, anger, and neutrality. The project aims to contribute to advancements in
human-computer interaction, affective computing, and the development of emotionally intelligent
systems.

The purpose of this study is speech emotion detection using machine learning algorithms. In
detail, this document will provide a general description of our project, including user
requirements, product perspective, and overview of requirements, general constraints. In addition,
it will also provide the specific requirements and functionality needed for this project - such as
interface, functional requirements and performance requirements.

Automatic speech emotion recognition is an active research area in the field of human computer
interaction (HCI) with wide range of applications. Extracted features of our project work are
mainly related to statistics of pitch and energy as well as spectral features

2.2 PROBLEM STATEMENT

The ability to accurately recognize and classify emotions conveyed through speech is crucial for
various applications, including human-computer interaction, customer service, mental health
monitoring, and sentiment analysis in social media. However, existing speech emotion recognition
systems often face challenges such as limited accuracy, robustness to noise, and generalization
across different speakers and languages. Furthermore, there is a need for real-time processing
capabilities to enable seamless integration into interactive systems. Therefore, this project seeks to
address these challenges by developing a novel speech emotion recognition system that achieves
high accuracy, robustness, and efficiency across diverse speech samples and environmental
conditions.

Dept. of CSE, SDIT 4


Speech Emotion Recognition 2022-23

CHAPTER 3

LITERATURE REVIEW
Md. Rayhan Ahmed et al. [1], used four deep neural network-based models built using LFABS.
Model-A uses seven LFABs followed by FCN layers and a soft max layer for classification. Model-
B uses LSTM and FCNs, Model-C uses GRU and FCNs and Model-D combines the three
individual models by adjusting their weights. From each of these audio files, they hand-craft five
categories of features- MFCC, LMS, ZCR, RMSE. did. data set. These features are used as inputs to
a one-dimensional (1D) CNN architecture to further extract hidden local features in these speech
files. To obtain additional contextual long-term representations of these learned local features via
the 1D CNN block, we extended our experiment by incorporating LSTM and GRU after the CNN
block, giving us more improved accuracy. After running DA, we observe that all four models
perform very well on the SER task of detecting emotions from raw speech audio. Among all four
models, the ensemble Model-D achieves the state-of-the-art weighted average accuracy of 99.46%
for TESS dataset.

A novel paradigm for emotion identification in the presence of noise and interference was put out
by Shibani Hamsa et al. [3]. In order to examine the 21 speaker's emotions, our method takes into
account the speaker's energy, time, and spectral factors. However, we suggest adopting the novel
wavelet packet transform (WPT)-based cochlear filter bank rather of the gammatone filter bank and
short-time Fourier transform (STFT) frequently employed in the literature. To do. When tested on
three different speech corpora in two different languages, our system—which combines this
representation with a random forest classifier—performs better than other existing algorithms and is
less prone to stressful noise. All metrics (Accuracy, Precision, Recall, and F1 scores) in the
RAVDESS and SUSAS datasets score above 80%.

A data imbalance processing approach based on the selective interpolation synthetic minority
oversampling (SISMOTE) methodology is suggested by Zhen-Tao Liu et al. [4] to reduce the
influence of sample imbalance on emotion identification outcomes. In order to minimise duplicate
characteristics with inadequate emotional representation, a feature selection approach based on
analysis of variance and gradient-enhanced decision trees (GBDT) is also provided. The results of
speech emotion detection tests on the CASIA, Emo DB, and SAVEE databases demonstrate that our
technique produces an average of 90.28% (CASIA), 75.00% (SAVEE), and 85.82% (based on the
findings) (Emo-DB). It demonstrates its precision in recognition. Utilizing voice emotion
recognition is superior to some cutting-edge technologies.

Dr. Nilesh Shelke et al. [2] used RAVDESS, TESS and SAVEE datasets for classification. Their
purpose is to mandate the modernization of current plans and technology enabling EDS and to
implement assistance in all areas of computers and technology. Analytics complement emotions
extracted from databases, layers, and model libraries created for emotion recognition from speech.
It mainly focuses on data collection, feature extraction, and automatic emotion detection results.
The intermodal recognition computer system is considered a unimodal solution because it offers

Dept. of CSE, SDIT 5


Speech Emotion Recognition 2022-23

higher sorting accuracy. Accuracy depends on the number of emotions detected, the features
extracted, the classification method, and the stability of the database.

To accomplish efficient speech emotion identification, Apeksha Aggarwal et al. [5] have presented
two alternative feature extraction strategies. First, utilising super convergence to extract two sets of
latent features from voice data, bidirectional feature extraction is presented. Principal Component
Analysis (PCA) is used to produce the first set of features for the first set of features. A second
method involves extracting the Mel spectrogram picture from the audio file and feeding the 2D
image into his pre-trained VGG-16 model. In this study, several algorithms are used in
comprehensive experimentation and rigorous comparative analysis of feature extraction approaches
across two 22 datasets (RAVDESS ANDTESS). The RAVDESS dataset offered significantly more
accuracy.

A voice analysis-based emotion recognition system was proposed by Noushin Hajarolasvadi and
Hasan Demirel [7]. In order to extract an 88-dimensional vector of audio characteristics, including
Mel-frequency cepstrum coefficients (MFCC), pitch, and intensity for each frame, they first
partition each audio signal into overlapping frames of identical duration. For every frame, a
spectrogram is created concurrently. The speech signal is retrieved from each audio signal by
applying k-means clustering to the extracted characteristics of all the frames. This is the last
preprocessing step. Then, 3D tensor keyframes 23 are used to represent the relevant series of
spectrograms. Instead of using the entire set of spectrograms corresponding to speech frames, they
selected the k best frames to represent the entire speech signal. They then compared the proposed
3D-CNN results s. With the 2D-CNN results and demonstrated that the proposed method
outperforms pre-trained 2D meshes.

K.A.Darshan, DR.B.N.Veerappa [11] the purpose of this paper is to document the development of
speech recognition systems using CNNs. Design a model that can recognize the emotion of an
audio sample. Various parameters are changed to improve the accuracy of the model. This paper
also aims to find the factors that affect model accuracy and the key factors needed to improve
model efficiency. The whitepaper concludes with a discussion of various CNN architectures and
parameter accuracies needed to improve accuracy, as well as potential areas for improvement.

Dept. of CSE, SDIT 6


Speech Emotion Recognition 2022-23

CHAPTER 4

SYSTEM REQUIREMENTS

4.1 System Requirements


4.1.1 Software Requirements

• Programming Language: Python, Javascript, Php, Html


• IDE: Visual Studio Code
• Operating System: Windows 7 or above
• Dataset: Kaggle

4.2.1 Hardware Requirements

• RAM: 8GB
• CPU: Intel Core i3
• Disk: Minimum 512GB

TECHNIQUE USED

4.2 MACHINE LEARNING

The simple system of device studying is to give schooling facts to a learning algorithm. The
learning algorithm then generates a brand-new set of rules, primarily based on inferences from the
facts. This is in essence producing a new algorithm, officially referred to as the machine mastering
model. Instead of programming the computer each step of the manner, this approach offers the
device commands that allow it to study from facts without new step-with the aid of-step
commands via the programmer. Several troubles need to be considered while addressing AI,
including, socioeconomic effects; troubles of transparency, bias, and accountability; new makes
use of for information, considerations of protection and safety, ethical issues; and, how AI enables
the advent of latest ecosystems.The concerns regarding responsibility; and, its doubtlessly
disruptive effects on social and monetary structures.

Dept. of CSE, SDIT 7


Speech Emotion Recognition 2022-23

CHAPTER 5

METHODOLOGY
In Speech Emotion Recognition (SER) methodology, data collection involves gathering diverse
speech recordings with labeled emotions, followed by preprocessing steps like feature extraction,
normalization, and segmentation. Model selection encompasses traditional machine learning or deep
learning models, or hybrid approaches. Training involves splitting data, training the chosen model(s),
and fine-tuning hyperparameters to prevent overfitting, while evaluation employs metrics like
accuracy and F1-score. Post-processing and performance tuning refine results, with deployment in
real-world applications following. Continual improvement considers ethical implications and
incorporates new research findings to enhance SER performance iteratively.

Librosa Model

Fig 5.1 Flow chart

5.1 Audio Gathering

Audio files were also collected from different sources.

Dept. of CSE, SDIT 8


Speech Emotion Recognition 2022-23

5.2 Audio classification

The recording that we gathered were split into two classes one for preparing information and
another for test information to be utilized in grouping test. Preparing systems are prepared with the
python TensorFlow library. At that point exactness is measure and picks two appropriate models for
use in picture recognition

5.3 Emotion detection

In the provided code, you're extracting features from the audio data using Mel-frequency cepstral
coefficients (MFCCs), which are commonly used for speech and audio processing tasks. Then,
you're building a classification model using a recurrent neural network (RNN) with Long Short-
Term Memory (LSTM) units.

5.4 Emotion detection experiment

In this progression, it demonstrates a pipeline for processing audio data, extracting features, and
training a model for emotion detection. It's a common approach in the field of affective computing,
which focuses on developing systems that can recognize, interpret, process, and simulate human
emotions.

5.5 Interpretation of the result


Interpreting the results of emotion detection from audio data involves analysing the performance of
the trained model in classifying different emotions. Here are some steps you can take to interpret the
results:

Dept. of CSE, SDIT 9


Speech Emotion Recognition 2022-23

Model Evaluation: After training the model, evaluate its performance on a separate validation set
or through cross-validation. This typically involves calculating metrics such as accuracy, precision,
recall, and F1-score for each emotion class.
Confusion Matrix: Examine the confusion matrix to see how well the model is performing for
each emotion category. The confusion matrix shows the true positives, false positives, true
negatives, and false negatives for each class, providing insights into which emotions are being
correctly classified and which are being confused with others.
Class-wise Metrics: Calculate metrics like precision, recall, and F1-score for each emotion class.
Precision measures the proportion of true positive predictions among all positive predictions for a
given emotion, recall measures the proportion of true positive predictions among all actual
instances of that emotion, and F1-score is the harmonic mean of precision and recall.
Visualization: Visualize the model's performance using plots such as ROC curves (Receiver
Operating Characteristic) or precision-recall curves. These can provide a visual understanding of
the trade-off between true positive rate and false positive rate, or between precision and recall,
respectively.
Error Analysis: Examine instances where the model misclassifies emotions and try to understand
why. This could involve listening to audio samples, analyzing the features extracted from those
samples, and identifying potential reasons for misclassification.
Comparison with Baselines: Compare the performance of your model with baseline methods or
previous studies in the field. This can help contextualize the results and understand whether the
model is performing competitive.

5.6 Use Case Diagram

The use case diagram depicts the interaction between the user, the emotion detection system, and
external sources of audio data. The user selects an audio file, which is then analysed for emotions by
the Emotion Detector system using the Audio System (Librosa). Finally, the results of emotion
classification are provided to the user for viewing. External audio sources, such as repositories or
databases, can provide additional audio files for analysis.

Dept. of CSE, SDIT 10


Speech Emotion Recognition 2022-23

Fig 5.2 Use case Diagram

CHAPTER 6

RESULTS

Dept. of CSE, SDIT 11


Speech Emotion Recognition 2022-23

Fig 6.1 Home Page

Fig 6.2 Login page

Fig 6.3 Registration page

Dept. of CSE, SDIT 12


Speech Emotion Recognition 2022-23

Fig
6.4

Admin Login page

Fig 6.5 User


page

Dept. of CSE, SDIT 13


Speech Emotion Recognition 2022-23

Fig 6.6 Admin page

Fig 6.7 Admin Register page

Dept. of CSE, SDIT 14


Speech Emotion Recognition 2022-23

Fig 6.8 Result 1

Fig 6.9 Result 2

Dept. of CSE, SDIT 15


Speech Emotion Recognition 2022-23

Fig 6.10 Contact Us page

CHAPTER 7

Dept. of CSE, SDIT 16


Speech Emotion Recognition 2022-23

CONCLUSION

In conclusion, this project has demonstrated the feasibility and effectiveness of


employing machine learning algorithms and signal processing techniques for the
recognition and classification of emotions from speech signals. Through extensive
experimentation and evaluation, it was observed that the developed system achieved
significant accuracy in identifying various emotional states, including happiness,
sadness, anger, and neutrality. Additionally, efforts were made to enhance the robustness
of the system by incorporating techniques to mitigate the effects of noise and variability
in speech characteristics across different speakers and languages. Moreover, the
integration of real-time processing capabilities and the development of user-friendly
interfaces facilitates the practical deployment of the system in real-world applications,
such as human-computer interaction and mental health monitoring,

Overall, this project contributes to the ongoing research efforts in affective computing
and lays the groundwork for the development of more sophisticated and emotionally
intelligent systems that can better understand and respond to human emotions conveyed
through speech.

REFERENCES
[1] Shelke, N., Wadyalkar, V., Kotangale, D., Kuyate, N., Nerkar, A., & Gour, N. (n.d.). A NOVEL
APPROACH TO EMOTION DETECTION FROM SPEECH

[2] Kumar, A., & Iqbal, J. L. M. (2019). Machine Learning Based Emotion Recognition using
Speech Signal. International Journal of Engineering and Advanced Technology (IJEAT), 9,
2249–8958.
[3] Mittal, R., Vart, S., Shokeen, P; Kumar, M. (2022). Speech Emotion Recognition
[4] P. Tzirakis, J. Zhang, B. W. Schuller, ―End-to-end speech emotion recognition using
deep neural networks‖, 2018 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 5089-5093, 2018

[5] . R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar and T. Alhussain, "Speech


emotion recognition using deep learning techniques: A review", IEEE Access, vol. 2, no. 7, pp.
117327-117345, 2019.

Dept. of CSE, SDIT 17

You might also like