Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture as the real life corpus, CEMO, composed of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers. Index Terms-emotion detection, end-to-end deep learning architecture, call center, real-life database, complex emotions.
翻译:在本文中,为了验证我们神经网络结构的性能,以便通过言论来认识情绪,我们首先在社区可以接触到的广受使用的软件中培训和测试了它。我们随后使用与真实生活资料相同的结构,即CEMO,由485名发言者的440个对话(2h16m)组成。在现实生活紧急对话中,呼唤者最经常表达的情感是恐惧、愤怒和积极情绪,如救济。在IEMOC的一般性主题对话中,最经常的情感是悲伤、愤怒和幸福。我们使用同样的端到端深学习结构,即IEMOCAP获得63%的无重量缩微调回声(UA),CEMO有45.6%的UA,每个有4类。CEMO最经常表达的情绪是恐惧、愤怒和积极情绪,例如救济等。在IMO的一般主题对话中,最经常出现的情绪是悲伤、愤怒和快乐。在IAMO中,我们从真实的情感分析中可以明显地看到,通过AA-CA-C-C-C-C-CReral-deal 和81.1级的动作与ILA-LA-I-I-I-I-I-I-I-I-I-I-I-I-I-I-L-I-I-I-I-I-L-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-L-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-MA-MA-MA-L-I-MA-MA-MA-MA-MA-MA-MA-L-MA-MA-L-I-I-L-I-I-L-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-MA