Using mel-spectrograms over conventional MFCCs features, we assess the abilities of convolutional neural networks to accurately recognize and classify emotions from speech data. We introduce FSER, a speech emotion recognition model trained on four valid speech databases, achieving a high-classification accuracy of 95,05\%, over 8 different emotion classes: anger, anxiety, calm, disgust, happiness, neutral, sadness, surprise. On each benchmark dataset, FSER outperforms the best models introduced so far, achieving a state-of-the-art performance. We show that FSER stays reliable, independently of the language, sex identity, and any other external factor. Additionally, we describe how FSER could potentially be used to improve mental and emotional health care and how our analysis and findings serve as guidelines and benchmarks for further works in the same direction.
翻译:使用常规 MFCC 常规功能的中分光谱,我们评估进化神经网络准确识别和分类语言数据的能力。我们引入了FSER,这是在四个有效语言数据库中培训的语音情感识别模型,实现了95,05 ⁇ 的高分类准确度,超过8个不同的情感类别:愤怒、焦虑、平静、厌恶、幸福、中立、悲伤、惊讶。在每一个基准数据集中,FSER都优于迄今引入的最佳模型,取得了最新业绩。我们显示FSER独立于语言、性别身份和其他外部因素,保持可靠。此外,我们描述了FSER如何能够用来改善精神和情感保健,以及我们的分析和发现如何作为进一步朝着同一方向开展工作的指南和基准。