0% found this document useful (0 votes)

3 views7 pages

DSP RP 5

This research paper presents a novel approach to speech emotion recognition (SER) inspired by the human brain's emotional perception mechanisms. The authors propose a method that incorporates implicit emotional attributes through multi-task learning, leading to improved accuracy in recognizing emotions from speech. Preliminary experiments demonstrate a significant increase in both unweighted and weighted accuracy on the IEMOCAP dataset, validating the effectiveness of their approach.

Uploaded by

srushtisapkal285

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

DSP RP 5

Uploaded by

srushtisapkal285

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Liu et al.

EURASIP Journal on Audio,

EURASIP Journal on Audio, Speech, and Music Processing (2023) 2023:22
https://doi.org/10.1186/s13636-023-00289-4 Speech, and Music Processing

EMPIRICAL RESEARCH Open Access

Speech emotion recognition based

on emotion perception
Gang Liu1* , Shifang Cai1 and Ce Wang1

Abstract
Speech emotion recognition (SER) is a hot topic in speech signal processing. With the advanced development of
the cheap computing power and proliferation of research in data-driven methods, deep learning approaches are
prominent solutions to SER nowadays. SER is a challenging task due to the scarcity of datasets and the lack of emo-
tion perception. Most existing networks of SER are based on computer vision and natural language processing, so
the applicability for extracting emotion is not strong. Drawing on the research results of brain science on emotion
computing and inspired by the emotional perceptive process of the human brain, we propose an approach based on
emotional perception, which designs a human-like implicit emotional attribute classification and introduces implicit
emotional information through multi-task learning. Preliminary experiments show that the unweighted accuracy (UA)
of the proposed method has increased by 2.44%, and weighted accuracy (WA) 3.18% (both absolute values) on the
Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, which verifies the effectiveness of our method.
Keywords Speech emotion recognition, Emotion perception, Implicit emotional attribute, Multi-task learning

1 Introduction improved. Deep learning has become the mainstream

Speech emotion recognition (SER) usually refers to method of speech emotion recognition. As early as 2002,
the process by which a machine automatically recog- Chang-hyun Park et al. [10] have used recurrent neural
nizes human emotions and emotion-related states from networks (RNN) in speech emotion recognition. Gradually,
speech. Emotion plays an important role in human intel- purely deep learning and end-to-end approaches emerged.
ligence, rational decision-making, social interaction, In 2014, Jianwei Niu et al. [11] pioneered the use of deep
perception, memory, learning, and creation. As a higher neural networks (DNN) in speech emotion recognition.
creature, an important factor that distinguishes human Qirong Mao et al. [12] used convolutional neural networks
beings from animals is the transmission of emotions. (CNN) to learn the invariance of speech emotion features.
Speech emotion recognition has a wide range of practical In 2015, Lee and Tashev [13] used RNN with long short-
application scenarios, such as depression diagnosis [1], term memory (LSTM) units to tackle the tricky problem. In
call center [2], online classroom [3], etc. 2019, MA Jalal et al. [14] used the capsule neural network
With deep learning flourishing, the neural network for temporal modeling of speech emotion. R Shankar et al.
approach has swept many fields. Driven by deep learning, [15] applied highway neural network to speech emotion
great progress has been made in the field of speech emotion research, and in 2020, Shamane Siriwardhana et al. [16]
recognition [4–9] and the performance has been greatly exploited transformer to perform multi-modal emotion
recognition including speech.
Deep learning improves the recognition performance
*Correspondence: of speech emotion recognition to some extent, but the
Gang Liu
[email protected] network structure is mostly borrowed from the field of
1
School of Artificial Intelligence, Beijing University of Posts computer vision and natural language processing. The
and Telecommunications, Beijing, China main network structure is also specially designed to

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/.
Liu et al. EURASIP Journal on Audio, Speech, and Music Processing (2023) 2023:22 Page 2 of 7

solve problems in other fields. How to reasonably use the human brain for emotion perception (perceptual net-
network in other fields to improve the ability to model work, perceptual process, perceptual characteristics,
emotional information is a major problem in speech etc.). Due to the complexity of the human brain structure
emotion recognition. Moreover, the scarcity of data- and the insufficiency of existing research techniques, it
sets and emotion perception makes the recognition task is difficult for brain science to see the full picture of the
more challenging. Therefore, the performance of speech emotional cognitive mechanism of the human brain.
emotion recognition is still not ideal. In recent years, How the human brain processes information in speech
brain science is strengthening the exploration of the to recognize emotion is still a mystery. However, various
structure and function of various brain areas that pro- imaging technologies and electrophysiological signals
duce emotions, thought and consciousness in the human are used to establish topological structures of high-order
brain. For example, emotion perception mainly depends connections of brain networks at various levels, and there
on the limbic system of the human brain [17, 18], and have been many experiments carried out in related fields
different parts of the limbic system have different per- such as brain network modeling and emotional comput-
ception of different emotions [19, 20]. In this paper, an ing [21–25]. Existing research results also reveal some
approach inspired by emotion perception is proposed potential mechanisms of human emotional cognition. For
based on the human brain’s perceptive process of emo- example, different parts of the brain perceive different
tion, and a human brain-like implicit emotion attribute emotions differently [21, 25].
classification is designed. The implicit emotion attribute Research has shown that emotional perception is
information is introduced through multi-task learning to linked to a set of structures in the brain called the limbic
increase the extraction of emotion information. Prelimi- system, which includes the hypothalamus, the cingulate
nary experiments show that the unweighted accuracy cortex, the hippocampus, and others. Different parts play
(UA) on the the Interactive Emotional Dyadic Motion different roles in the perception of different emotions.
Capture (IEMOCAP) dataset is improved by 2.44%, and For example, removing the amygdala leads to a reduc-
the weighted accuracy (WA) by 3.18% (both absolute tion in fear, and the posterior hypothalamus may be par-
values), which verifies that the proposed human brain- ticularly important for anger and aggression. The frontal
like implicit emotion attribute classification is beneficial cortex of the brain is more sensitive to intense emotions
to extract emotion information. such as happiness and anger. A part of the brain called
The paper is organized as follows: Section 2 intro- the hypothalamus is more active during the process of
duces the characteristics of the human brain’s emotion feeling sadness. Meanwhile, a part of the brain called the
perception. Section 3 introduces the network designed hippocampus plays a significant role in the perception of
according to the characteristics of emotion perception. sadness.
Section 4 elaborates the experimental results and con- Table 1 lists emotions and the parts of the brain associ-
clusions. Section 5 summarizes the paper and further ated with them. According to Table 1, the following char-
looks forward to the future development of speech acteristics related to the human brain perception can be
emotion recognition. concluded.

2 Emotion perception 1) The human brain’s perception of an emotion is

Most researchers are likely to improve the recognition related to multiple parts of the limbic system, which
performance by changing neural networks structure indicates that the human brain’s emotional percep-
without referring to the perceptual mechanism of the tion network may have a certain structure, and the

Table 1 Emotion and its related brain parts

Emotion Brain parts
Amygdala Cortex Hippocampus Cortex Thalamus Island Gyrus Occipital lobe Neurite

Happy Left Right frontal Left Anterior

Fear Bilateral Left frontal Hypothalamus
Sad Whole Whole Left Left Right
Disgust Left Left subfrontal Island
Anger Whole Prefrontal Right Island
Joy Bilateral Bilateral
lower frontal
Liu et al. EURASIP Journal on Audio, Speech, and Music Processing (2023) 2023:22 Page 3 of 7

structure causes the relevant parts of the human can sense some emotions, it means that these emo-
brain to be more sensitive to certain emotions. Could tions contain the same implicit attribute information.
this structure be introduced into speech emotion At the same time, the fact that the same emotion can
recognition? Many parts of the human brain are sen- be sensed by different parts suggests that these parts
sitive to the same emotion, but there are differences have some similar structural features. Based on this,
in sensitivity. So, are there similar but not identical this paper designs an implicit emotion attribute classi-
internal structures in the various parts of the human fier to simulate the emotion perception of the human
brain? brain. So, an implicit attribute binary classifier is
2) Different emotions are involved in different parts. For designed according to whether the perception of dif-
example, both anger and sadness are linked to the ferent emotions by certain parts is related, as shown in
amygdala, but sadness is linked to the left thalamus, Table 2. For example, if the frontal cortex of the human
whereas anger is not. This means that the human brain has a strong perception of happiness and anger,
brain has differences in the perception of different it is believed that happiness and anger have the same
emotions and also shows that the differences in the implicit attribute (denoted as attribute A), while other
perception of different parts of the human brain are emotions (sadness and neutral, etc.) do not have attrib-
related to the internal structure of the parts. So what ute A. So in the binary classifier for attribute A, the
exactly are these parts perceptual about? classification label of happiness and anger is set to 1,
3) One part of the limbic system is related to the per- while the classification label of other emotions is set to
ception of multiple emotions. For example, the amyg- 0. In this paper, four parts with high degree of distinc-
dala is associated with the perception of happiness, tion are introduced, and four implicit attributes of A–D
sadness, anger, and other emotions. What common and corresponding classifiers are defined, as shown in
information does the amygdala perceive in these the Table 2.
emotions?
3.2 Multi‑task learning based on implicit attribute
According to above analysis, we propose a conjecture classification
that some parts of the human brain’s limbic system can In order to train the four implicit attribute classifica-
perceive certain attribute information in emotions, tion and speech emotion classifiers at the same time,
and the attribute information is the common infor- this paper adopts the way of multi-task learning, and
mation of many emotions that this part can perceive. the loss of the implicit emotion attribute binary classi-
The specific attribute is unknown, so this paper calls it fication task will be added to the total loss of the model
implicit attribute information. The perceptual network with a certain weight. At the same time, referring to the
of the human brain for emotion has a certain structure. structured network characteristics of the human brain
Therefore, implicit attribute information is extracted for emotion recognition, the network in this paper also
through some parts of the limbic system and then sent introduces the implicit emotion attribute information,
to the brain center with the underlying information for which increases the difference between different emo-
emotion recognition. tions, and is conducive to the network to recognize dif-
Based on these assumptions, this paper adopts arti- ferent emotions.
ficial neural networks to simulate the parts of the The specific structure of the network is shown in
human limbic system that draw on its mechanism of the Fig. 1. The network consists of four CNN layers,
extracting and perceiving emotional information and four categories of implicit emotional attributes, gated
proposes a method based on emotional perception. recurrent unit (GRU), and an attention layer. Firstly,
According to the limbic system’s perception of dif- the logMel spectrum extracted by Librosa [26] is used
ferent emotions, implicit attribute classification is as the input of the network, and then the extracted
defined, and the information of implicit attribute is features are input into four continuous CNN layers,
extracted through multi-task learning which is then
added into the emotion recognition system as auxiliary
information for recognition. Table 2 Implicit attribute classification
Attribute parts Label 1 Label 0
3 Emotion recognition based on emotion
perception A Frontal cortex Happy, angry Sad, neutral
3.1 Implicit emotion attribute classification design B Thalamus Sad Happy, angry, neutral
A part of the human limbic system can sense some C Hippocampus Sad, angry Happy, neutral
implicit attribute information of emotions. If a part D Anterior neurite Happy Anger, sad, neutral
Liu et al. EURASIP Journal on Audio, Speech, and Music Processing (2023) 2023:22 Page 4 of 7

Table 3 Binary classification results of implicit attribute

Attribute categories UA

A 84.59%
B 95.06%
C 77.54%
D 66.39%

into two parts, improvised and scripted, depending on

whether the actors perform according to a fixed script.
The dataset is labeled for 9 types of emotion-anger,
excitement, happiness, sadness, frustration, fear, neu-
tral, surprise, and other.
Because of the imbalance in the dataset, researchers
usually choose the most common emotions, such as
neutral state, happiness, sadness, and anger. Because
excitement and happiness are similar to each other to
a certain extent, and there are too few utterances about
happiness, so researchers sometimes replace happiness
with excitement or combine excitement and happiness
to avoid the problem of too few utterances about happi-
ness [28–30]. In addition, existing studies have shown
that the accuracy of using improvised data is higher
than that of using scripted data [28, 31] which may be
due to the fact that actors in improvised data concen-
trate on emotional expression.
Fig. 1 Specific structure of the network In this paper, we employ improvised data from the
IEMOCAP dataset, which includes four types of emo-
tion: excitement, sadness, neutral, and anger.
and then through the implicit emotion attribute clas-
sification network composed of the GRU and attention
layer. The output of the attention layer of the binary 4.2 Experiments on implicit attribute binary classification
classification network is combined with the output of In order to verify the reliability of the implicit attribute
the last CNN layer to get the input of the final emo- hypothesis, this paper first carries out a binary clas-
tion classification network. After classification, the sification experiment of implicit emotion attributes,
emotion predicted by the model is obtained. Cross and the results are shown in Table 3. It can be seen
entropy loss function is used to optimize the network that the experimental results of four kinds of implicit
training. The attention layer in the network is condu- attribute binary classification experiments are rela-
cive to extracting the difference of implicit emotional tively ideal, indicating that the hypothesis of implicit
attribute information, simulating the difference in the attribute has certain credibility. At the same time,
sensitivity of different parts of the brain to different the result of attribute B is the best, achieving 95.06%,
emotion perception, and reflecting the difference in while the result of attribute D is only 66.39%. The dif-
the internal structure of different parts of the limbic ference is relatively large, indicating that there are dif-
system. ferences in the stability of different implicit attributes,
or that different parts of the limbic system perceive
4 Experiment results and evaluations emotional information differently. The internal struc-
4.1 Dataset ture of different parts leads to this difference. In Fig. 1,
The Interactive Emotional Dyadic Motion Capture the weight of the attention layer reflects the difference
(IEMOCAP) [27] is the most widely used dataset in of internal structure in different parts of the limbic
SER. It consists of 12 h of emotional speech performed system.
by 10 actors from the Drama Department of Univer-
sity of Southern California. The performance is divided
Liu et al. EURASIP Journal on Audio, Speech, and Music Processing (2023) 2023:22 Page 5 of 7

4.3 Emotion classification based on multi‑task learning tive role in emotion classification only with the assis-
In order to verify the effect of different implicit attrib- tance of other attribute information. This further ver-
utes on speech emotion recognition, different multi-task ifies the credibility of the implicit attribute hypothesis
experiments are designed in this paper, namely, multi- in this paper. Meanwhile, the experimental results
task experiments based on 1–4 implicit attributes are indirectly prove that the part where the human brain
respectively carried out. The experimental results are recognizes emotions may share some kind of infor-
shown in Table 4. The following points can be inferred mation. When recognizing the same emotion, the
from Table 4. level of sensitivity varies differently as for the parts
of the limbic system. Combining a variety of infor-
1) The experimental performance of adopting four mation to jointly judge the emotional changes of the
attributes is the best, achieving that the UA index is surrounding characters, this also further verifies the
2.42% higher than that of the baseline system (single credibility of the implicit attribute hypothesis in this
task), and the WA index is 3.18% higher than that of article.
the baseline system (absolute value), indicating the 3) The mixed use of two and three attributes has high
effectiveness of our method based on emotion per- and low experimental performance, indicating that
ception. It is conducive to extract emotional informa- the relationship between different implicit attributes
tion which draws lessons from the exploration results may be complementary or cancel each other for
of brain science on the structure and function of the the effect of emotion recognition. And the effect of
brain area that produces emotions in the human using all four attributes is the best, demonstrating
brain, combined with deep learning to simulate the that the positive effect of these implicit attributes
neural network of the human brain. needs more attributes to participate. Different parts
2) In the multi-task experiment with single attribute, of the human limbic system have certain implicit
A, B, and C all are improved, but the introduction of emotional attributes. Experiments have shown that
attribute D reduces the performance of multi-task, multiple parts are involved when recognizing a cer-
which is consistent with the experimental results of tain emotion, but the states of inhibition or activa-
binary classification. It is possible that the emotional tion when different parts recognize emotions are
information in implicit attribute D is unstable. How- different.
ever, in the mixed use experiment of attribute D and
other attributes, the system performance is generally
better than the baseline. It indicates that although the
stability of attribute D is not good, it can play a posi- 5 Conclusion
Brain science is constantly studying the brain structure
and underlying mechanism of emotions. Combined with
Table 4 Emotion recognition results based on multi-task
the continuous simulation of the human brain by artifi-
learning cial intelligence, this paper draws on the mechanism of
the human brain emotional perception and designs the
Multi-task learning UA WA
implicit emotional attributes classification to imitate the
A 68.57% 67.79% brain structure related to emotions. Implicit emotion
B 68.91% 67.42% information is introduced through multi-task learning
C 68.31% 67.23% as auxiliary information to recognize emotion, improv-
D 67.44% 66.67% ing the effect of speech emotion recognition and prov-
A+B 68.61% 65.36% ing the effectiveness of the network proposed in this
A+C 67.46% 67.23% paper. In the future, we can learn from the human brain’s
A+D 67.91% 64.23% mechanism of cognitive emotions and add more attrib-
B+C 67.25% 67.42% ute information. Meanwhile, we can also adopt different
B+D 68.75% 68.16% approaches instead of multi-task learning to mine emo-
C+D 68.23% 68.16% tional information.
A+B+C 67.84% 65.92%
A+B+D 68.30% 66.29% Abbreviations
A+C+D 68.14% 66.29% SER Speech emotion recognition
B+C+D 69.38% 67.04% UA Unweighted accuracy
WA Weighted accuracy
A+B+C+D 70.42% 67.79% IEMOCAP The Interactive Emotional Dyadic Motion Capture
Baseline (single task) 67.98% 64.61% RNN Recurrent neural networks
Liu et al. EURASIP Journal on Audio, Speech, and Music Processing (2023) 2023:22 Page 6 of 7

DNN Deep neural networks 7. Y. Xu, H. Xu, J. Zou, in IEEE International Conference on Acoustics, Speech
CNN Convolutional neural networks and Signal Processing (ICASSP). Hgfm: a hierarchical grained and feature
LSTM Long short-term memory model for acoustic emotion recognition (IEEE, Barcelona, 2020), pp.
GRU Gated recurrent unit 6499–6503
8. D. Priyasad, T. Fernando, S. Denman, S. Sridharan, C. Fookes, in IEEE
Acknowledgements International Conference on Acoustics, Speech and Signal Processing
The authors thank the editors and the anonymous reviewers for their con- (ICASSP). Attention driven fusion for multi-modal emotion recognition
structive comments and useful suggestions. (IEEE, Barcelona, 2020), pp. 3227–3231
9. A. Nediyanchath, P. Paramasivam, P. Yenigalla, in IEEE International Con-
Authors’ contributions ference on Acoustics, Speech and Signal Processing (ICASSP). Multi-head
Liu innovatively proposed and designed the complete experiments for the attention for speech emotion recognition with auxiliary learning of
paper and was a major contributor in writing the manuscript. Cai and Wang gender recognition (IEEE, Barcelona, 2020), pp. 7179–7183
worked together to filter and sort emotional data and wrote programs to 10. C.H. Park, D.W. Lee, K.B. Sim, Emotion recognition of speech based on
complete the entire experiments. After the experiment, all of us analyzed the rnn. Nurse Lead. 4, 2210–2213 (2002). https://doi.org/10.1109/ICMLC.
whole experimental results to further corroborate our conjecture. All authors 2002.1175432
read and approved the final manuscript. 11. J. Niu, Y. Qian, K. Yu, in The 9th International Symposium on Chinese
Spoken Language Processing. Acoustic emotion recognition using deep
Funding neural network (IEEE, Singapore, 2014), pp. 128–132
Not applicable. 12. Q. Mao, M. Dong, Z. Huang, Y. Zhan, Learning salient features for
speech emotion recognition using convolutional neural networks. IEEE
Availability of data and materials Trans. Multimedia 16(8), 2203–2213 (2014)
The Interactive Emotional Dyadic Motion Capture (IEMOCAP) is the most 13. J. Lee, I. Tashev, in Proceedings of Interspeech 2015. High-level feature
widely used dataset in SER. The dataset is accessible at https://sail.usc.edu/ representation using recurrent neural network for speech emotion
iemocap/. It consists of 12 h of emotional speech performed by 10 actors from recognition (ISCA, Dresden Germany, 2015)
the Drama Department of University of Southern California. The performance 14. M.A. Jalal, E. Loweimi, R.K. Moore, T. Hain, in Proceedings of Interspeech
is divided into two parts, improvised and scripted, depending on whether the 2019. Learning temporal clusters using capsule routing for speech
actors perform according to a fixed script. The dataset is labeled for 9 types emotion recognition (ISCA, Graz, 2019), pp. 1701–1705
of emotion-anger, excitement, happiness, sadness, frustration, fear, neutral, 15. R. Shankar, H.W. Hsieh, N. Charon, A. Venkataraman, in Proceedings of
surprise, and other. Our experiments choose four main emotions-anger, Interspeech 2019. Automated emotion morphing in speech based on
excitement, happiness, and sadness. Furthermore, the experimental code diffeomorphic curve registration and highway networks(ISCA, Graz,
implementation is available at https://github.com/FlowerCai/speech-emoti 2019), pp. 4499–4503
on-recognition. We can research further in the field of SER based on the 16. S. Siriwardhana, T. Kaluarachchi, M. Billinghurst, S. Nanayakkara, Mul-
experiment in the future. timodal emotion recognition with transformer-based self supervised
feature fusion. IEEE Access 8, 176274–176285 (2020)
17. S. Costantini, G. De Gasperis, P. Migliarini, in 2019 IEEE Second Interna-
Declarations tional Conference on Artificial Intelligence and Knowledge Engineering
(AIKE). Multi-agent system engineering for emphatic human-robot
Ethics approval and consent to participate interaction (IEEE, Sardinia Italy, 2019), pp. 36–42
Not applicable. 18. H. Okon-Singer, T. Hendler, L. Pessoa, A.J. Shackman, The neurobiology
of emotion-cognition interactions: fundamental questions and strate-
Competing interests gies for future research. Front. Hum. Neurosci. 9, 58 (2015)
The authors declare that we have no competing interests. 19. Q. Ma, D. Guo, Research on brain mechanisms of emotion. Adv. Psy-
chol. Sci. 11(03), 328 (2003)
20. S. Lee, S. Yildirim, A. Kazemzadeh, S. Narayanan, in Ninth European
Received: 5 April 2022 Accepted: 29 April 2023 Conference on Speech Communication and Technology. An articulatory
study of emotional speech production (ISCA, Lisbon Portugal, 2005)
21. J. LeDoux, Rethinking the emotional brain. Neuron 73(4), 653–676
(2012)
22. V.R. Rao, K.K. Sellers, D.L. Wallace, M.B. Lee, M. Bijanzadeh, O.G. Sani, Y.
References Yang, M.M. Shanechi, H.E. Dawes, E.F. Chang, Direct electrical stimula-
1. L.S.A. Low, N.C. Maddage, M. Lech, L.B. Sheeber, N.B. Allen, Detection of tion of lateral orbitofrontal cortex acutely improves mood in individu-
clinical depression in adolescents’ speech during family interactions. als with symptoms of depression. Curr. Biol. 28(24), 3893–3902 (2018)
IEEE Trans. Biomed. Eng. 58(3), 574–586 (2010) 23. P. Fusar-Poli, A. Placentino, F. Carletti, P. Landi, P. Allen, S. Surguladze,
2. X. Huahu, G. Jue, Y. Jian, in Proceedings of the 2010 International F. Benedetti, M. Abbamonte, R. Gasparotti, F. Barale et al., Functional
Conference on Artificial Intelligence and Computational Intelligence, vol. atlas of emotional faces processing: a voxel-based meta-analysis of 105
1. Application of speech emotion recognition in intelligent household functional magnetic resonance imaging studies. J. Psychiatry Neurosci.
robot, (IEEE, Sanya, 2010), pp. 537–541 34(6), 418–432 (2009)
3. W.J. Yoon, Y.H. Cho, K.S. Park, in International Conference on Ubiquitous 24. F. Ahs, C.F. Davis, A.X. Gorka, A.R. Hariri, Feature-based representations
Intelligence and Computing. A study of speech emotion recognition of emotional facial expressions in the human amygdala. Soc. Cogn.
and its application to mobile services (Springer, Hong Kong China, Affect. Neurosci. 9(9), 1372–1378 (2014)
2007), pp. 758–766 25. M.D. Pell, Recognition of prosody following unilateral brain lesion:
4. K. Han, D. Yu, I. Tashev, in Proceedings of Interspeech 2014. Speech influence of functional and structural attributes of prosodic contours.
emotion recognition using deep neural network and extreme learning Neuropsychologia 36(8), 701–715 (1998)
machine (ISCA, Singapore, 2014) 26. B. McFee, C. Raffel, D. Liang, D.P. Ellis, M. McVicar, E. Battenberg, O.
5. M. Chen, X. He, J. Yang, H. Zhang, 3-d convolutional recurrent neural Nieto, in Proceedings of the 14th python in science conference, vol. 8.
networks with attention model for speech emotion recognition. IEEE librosa: audio and music signal analysis in python (SciPy, Texas US,
Signal Process. Lett. 25(10), 1440–1444 (2018) 2015), pp. 18–25
6. X. Wu, S. Liu, Y. Cao, X. Li, J. Yu, D. Dai, X. Ma, S. Hu, Z. Wu, X. Liu, et al., in 27. C. Busso, M. Bulut, C.C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J.N.
IEEE International Conference on Acoustics, Speech and Signal Processing Chang, S. Lee, S.S. Narayanan, Iemocap: interactive emotional dyadic
(ICASSP). Speech emotion recognition using capsule networks (IEEE, motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008)
Brighton UK, 2019), pp. 6695–6699
Liu et al. EURASIP Journal on Audio, Speech, and Music Processing (2023) 2023:22 Page 7 of 7

28. P. Li, Y. Song, I.V. McLoughlin, W. Guo, L.R. Dai, in Proceedings of

Interspeech 2018. An attention pooling based representation learning
method for speech emotion recognition (ISCA, Hyderabad India, 2018)
29. Z. Zhao, Z. Bao, Z. Zhang, N. Cummins, H. Wang, B. Schuller, Attention-
enhanced connectionist temporal classification for discrete speech
emotion recognition (2019)
30. M. Neumann, N.T. Vu, in ICASSP 2019-2019 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP). Improving speech
emotion recognition with unsupervised representation learning on
unlabeled speech (IEEE, Brighton UK, 2019), pp. 7390–7394
31. L. Tarantino, P.N. Garner, A. Lazaridis, et al., in Proceedings of Interspeech
2019. Self-attention for speech emotion recognition (ISCA, Graz, 2019),
pp. 2578–2582

Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.

Ayurveda and Aromatherapy
No ratings yet
Ayurveda and Aromatherapy
5 pages
Barlow D. Et - Al. Abnormal Psychology 8th Ed 2018
No ratings yet
Barlow D. Et - Al. Abnormal Psychology 8th Ed 2018
58 pages
The Boston EI Questionnaire
80% (5)
The Boston EI Questionnaire
3 pages
Lassi 1
100% (1)
Lassi 1
2 pages
Nancy Eisenberg
No ratings yet
Nancy Eisenberg
12 pages
Psychology of Fear & Security
No ratings yet
Psychology of Fear & Security
4 pages
32 Teaching Strategies in Math
100% (1)
32 Teaching Strategies in Math
35 pages
Childhood Amnesia
No ratings yet
Childhood Amnesia
15 pages
Solving The Emotion Paradox: Categorization and The Experience of Emotion
No ratings yet
Solving The Emotion Paradox: Categorization and The Experience of Emotion
27 pages
Recognizing Young Children's Expressive Styles of Emotions (2-6 Years
No ratings yet
Recognizing Young Children's Expressive Styles of Emotions (2-6 Years
8 pages
The Physiology of Emotional Equilibrium
No ratings yet
The Physiology of Emotional Equilibrium
8 pages
Five Natural Emotions Explained
No ratings yet
Five Natural Emotions Explained
3 pages
LMCR2252-3 Teori-Teori Emosi
No ratings yet
LMCR2252-3 Teori-Teori Emosi
69 pages
What Is Emotional Intelligence?: Essentials
No ratings yet
What Is Emotional Intelligence?: Essentials
2 pages
Primacy-Recency in Teaching
0% (1)
Primacy-Recency in Teaching
17 pages
Positive and Negative Affect Schedule - Expanded Version
No ratings yet
Positive and Negative Affect Schedule - Expanded Version
2 pages
Barry Beyerstein - Altered States of Consciousness PDF
No ratings yet
Barry Beyerstein - Altered States of Consciousness PDF
11 pages
Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
No ratings yet
Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
15 pages
My Psychology 1st Edition Pomerantz Test Bank
100% (53)
My Psychology 1st Edition Pomerantz Test Bank
70 pages
A Simple Method For Olfactory Rehabilitation Following Total Laryngectomy
No ratings yet
A Simple Method For Olfactory Rehabilitation Following Total Laryngectomy
5 pages
Computers: Machine-Learning-Based Emotion Recognition System Using EEG Signals
No ratings yet
Computers: Machine-Learning-Based Emotion Recognition System Using EEG Signals
15 pages
CET Reading Comprehension
No ratings yet
CET Reading Comprehension
10 pages
EEG Emotion Recognition Network
No ratings yet
EEG Emotion Recognition Network
10 pages
Cognitive Learning Theory
No ratings yet
Cognitive Learning Theory
5 pages
The Body Keeps The Score
100% (1)
The Body Keeps The Score
22 pages
Human Brain
No ratings yet
Human Brain
56 pages
Frontal Lobe vs. Basal Ganglia in Cognitive Flexibility
No ratings yet
Frontal Lobe vs. Basal Ganglia in Cognitive Flexibility
12 pages
(DEAP描述) Accurate EEG-Based Emotion Recognition on Combined Features Using Deep Convolutional Neural Networks
No ratings yet
(DEAP描述) Accurate EEG-Based Emotion Recognition on Combined Features Using Deep Convolutional Neural Networks
12 pages
Speech Emotion Recognition Using Deep Learning Techniques: A Review
No ratings yet
Speech Emotion Recognition Using Deep Learning Techniques: A Review
19 pages
Chethana H N REPORT
No ratings yet
Chethana H N REPORT
12 pages
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
No ratings yet
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
11 pages
1 ST
No ratings yet
1 ST
23 pages
Emotion Recognition From Multiple Modalities: Fundamentals and Methodologies
No ratings yet
Emotion Recognition From Multiple Modalities: Fundamentals and Methodologies
13 pages
Sas#3 Psy063
No ratings yet
Sas#3 Psy063
9 pages
Speech Emotion Recognition via CNN LSTM
No ratings yet
Speech Emotion Recognition via CNN LSTM
12 pages
Child Brain Development Timeline (Factors & Stages) - Cover Three
No ratings yet
Child Brain Development Timeline (Factors & Stages) - Cover Three
20 pages
Speech Emotion Recognition in IoT
No ratings yet
Speech Emotion Recognition in IoT
5 pages
Review of The Emotional Feature Extraction and Classification Using EEG Signals
No ratings yet
Review of The Emotional Feature Extraction and Classification Using EEG Signals
12 pages
Group 11-1
No ratings yet
Group 11-1
12 pages
Deep Learning in Speech Emotion Recognition
No ratings yet
Deep Learning in Speech Emotion Recognition
4 pages
Applsci 14 02487
No ratings yet
Applsci 14 02487
21 pages
SPRINGERIJST
No ratings yet
SPRINGERIJST
11 pages
Speech Emotion Recognition Review
No ratings yet
Speech Emotion Recognition Review
19 pages
Tzirakis 2017
No ratings yet
Tzirakis 2017
9 pages
Real-Time Speech Emotion Recognition Using Deep Le
No ratings yet
Real-Time Speech Emotion Recognition Using Deep Le
40 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
Sensors 23 06212 v2
No ratings yet
Sensors 23 06212 v2
20 pages
The Positive Guide To Anger Managemen... (Z-Library)
100% (1)
The Positive Guide To Anger Managemen... (Z-Library)
122 pages
Emotional Foundations
No ratings yet
Emotional Foundations
27 pages
Multi-Modal Emotion Recognition From Speech and Facial Expression Based On Deep Learning
No ratings yet
Multi-Modal Emotion Recognition From Speech and Facial Expression Based On Deep Learning
4 pages
Recognition of Emotions in Speech Using Deep CNN A
No ratings yet
Recognition of Emotions in Speech Using Deep CNN A
18 pages
MS Thesis Final
No ratings yet
MS Thesis Final
47 pages
Jabrenee Hussie Honors Thesis
No ratings yet
Jabrenee Hussie Honors Thesis
13 pages
Ca - 2024 - Zica Time Table
No ratings yet
Ca - 2024 - Zica Time Table
4 pages
Enhanced Speech Emotion Detection Using Deep Neural Networks
No ratings yet
Enhanced Speech Emotion Detection Using Deep Neural Networks
14 pages
Deep Learning Based Emotion Recognition System Using Speech Features and Transcriptions
No ratings yet
Deep Learning Based Emotion Recognition System Using Speech Features and Transcriptions
12 pages
Speech Emotion Recognization
No ratings yet
Speech Emotion Recognization
65 pages
Towards The Explainability of Multimodal Speech Emotion Recognition
No ratings yet
Towards The Explainability of Multimodal Speech Emotion Recognition
5 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
Multimodal Information-Based Broad and Deep Learning Model For Emotion Understanding
No ratings yet
Multimodal Information-Based Broad and Deep Learning Model For Emotion Understanding
5 pages
Serdl 2
No ratings yet
Serdl 2
10 pages
Emotion Recognition in Speech by Multimodal Analysis of Audio and Text
No ratings yet
Emotion Recognition in Speech by Multimodal Analysis of Audio and Text
7 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
Deep Learning Techniques For Speech Emotion Recognition A Review
No ratings yet
Deep Learning Techniques For Speech Emotion Recognition A Review
6 pages
Applying Machine Learning Techniques For Speech Emotion Recognition
No ratings yet
Applying Machine Learning Techniques For Speech Emotion Recognition
6 pages
Emotion Recognition Based On EEG Using LSTM Recurr
No ratings yet
Emotion Recognition Based On EEG Using LSTM Recurr
5 pages
1 s2.0 S0950705123002757 Main
No ratings yet
1 s2.0 S0950705123002757 Main
11 pages
Review 3 PPT Final1)
No ratings yet
Review 3 PPT Final1)
51 pages
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
No ratings yet
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
6 pages
9 - Yogendra
No ratings yet
9 - Yogendra
5 pages
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
No ratings yet
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
7 pages
A Voice-Based Real-Time Emotion Detection Technique Using Recurrent Neural Network Empowered Feature Modelling
No ratings yet
A Voice-Based Real-Time Emotion Detection Technique Using Recurrent Neural Network Empowered Feature Modelling
22 pages
24 M Human - Brain - Waves - Study - Using - EEG - and - Deep - Learning - For - Emotion - Recognition
No ratings yet
24 M Human - Brain - Waves - Study - Using - EEG - and - Deep - Learning - For - Emotion - Recognition
9 pages
Highlighted
No ratings yet
Highlighted
23 pages
Multi-Modal Emotion Recognition On IEMOCAP Dataset
No ratings yet
Multi-Modal Emotion Recognition On IEMOCAP Dataset
7 pages
Speech Emotion Recognition Using Machine Learning
No ratings yet
Speech Emotion Recognition Using Machine Learning
8 pages
Electronics 12 04717
No ratings yet
Electronics 12 04717
14 pages
Research Paper Seminar
No ratings yet
Research Paper Seminar
17 pages
Emotional Speech Recognition Using Deep Neural Networks
No ratings yet
Emotional Speech Recognition Using Deep Neural Networks
20 pages
Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations
No ratings yet
Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations
12 pages
GROUP7 Researchpaper
No ratings yet
GROUP7 Researchpaper
9 pages
Paper5 Implementation
No ratings yet
Paper5 Implementation
7 pages
1st Review
No ratings yet
1st Review
19 pages
Sicheng Zhao Multimodal
No ratings yet
Sicheng Zhao Multimodal
15 pages
Kamaldeep Kaur
No ratings yet
Kamaldeep Kaur
45 pages
Emotion Recognition Using Speech Processing
No ratings yet
Emotion Recognition Using Speech Processing
5 pages
DSP RP 21
No ratings yet
DSP RP 21
10 pages
DSP RP 25
No ratings yet
DSP RP 25
6 pages
DSP RP 8
No ratings yet
DSP RP 8
5 pages
DSP RP 23
No ratings yet
DSP RP 23
4 pages
DSP RP 3
No ratings yet
DSP RP 3
4 pages
DSP RP 1
No ratings yet
DSP RP 1
21 pages

DSP RP 5

Uploaded by

DSP RP 5

Uploaded by

Liu et al.

EURASIP Journal on Audio,

EMPIRICAL RESEARCH Open Access

Speech emotion recognition based

1 Introduction improved. Deep learning has become the mainstream

2 Emotion perception 1) The human brain’s perception of an emotion is

Table 1 Emotion and its related brain parts

Happy Left Right frontal Left Anterior

Table 3 Binary classification results of implicit attribute

into two parts, improvised and scripted, depending on

28. P. Li, Y. Song, I.V. McLoughlin, W. Guo, L.R. Dai, in Proceedings of

You might also like

2 Emotion perception 1) The human brain’s perception of an emotion is