A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we used Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. The following emotions were categorized for this study: happiness, anger, neutral, sadness, disgust, and fear. We employed three Emotional Speech Sets, of which the first two were developed by native Bengali speakers in Bangla and English languages separately. The third was the Toronto Emotional Speech Set (TESS), which was developed by native English speakers from Canada. We carefully selected language-independent prosodic features, adopted a Support Vector Machine (SVM) model, and conducted three experiments to carry out our proposition. In the first experiment, we measured the performance of the three speech sets individually. This was followed by the second experiment, where we recorded the classification rate by combining the speech sets. Finally, in the third experiment we measured the recognition rate by training and testing the model with different speech sets. Although this study reveals that Speech Emotion Recognition (SER) is mostly language-independent, there is some disparity while recognizing emotional states like disgust and fear in these two languages. Moreover, our investigations inferred that non-native speakers convey emotions through speech, much like expressing themselves in their native tongue.
翻译:在本文中,我们使用孟加拉语和英语来评估将情绪与语言区分开来是否独立于语言。本研究报告对以下情感进行了分类:幸福、愤怒、中立、悲伤、厌恶和恐惧。我们采用了三种情感演讲套件,其中前两种是孟加拉语和英语分别用孟加拉语和英语分别制作的。第三个是多伦多情感演讲套件(TESS),这是由来自加拿大的母语英语人开发的。我们仔细选择了依赖语言的偏向语言特征,采用了支持矢量机(SVM)模型,并进行了三次实验来落实我们的建议。在第一次实验中,我们分别衡量了三种演讲套件的性能。随后,我们用这些套件合并了语言记录了分类率。最后,在第三次实验中,我们用培训和测试模式用不同的语言组合来衡量了承认率。尽管本项研究显示,语言识别(SER)大多是依赖语言的,但有一些差异,同时承认情感状态,如在这两种语言中表达反感和恐惧等。此外,我们用这些语言来表达非感官的情绪。