Designing reliable Speech Emotion Recognition systems is a complex task that inevitably requires sufficient data for training purposes. Such extensive datasets are currently available in only a few languages, including English, German, and Italian. In this paper, we present SEMOUR, the first scripted database of emotion-tagged speech in the Urdu language, to design an Urdu Speech Recognition System. Our gender-balanced dataset contains 15,040 unique instances recorded by eight professional actors eliciting a syntactically complex script. The dataset is phonetically balanced, and reliably exhibits a varied set of emotions as marked by the high agreement scores among human raters in experiments. We also provide various baseline speech emotion prediction scores on the database, which could be used for various applications like personalized robot assistants, diagnosis of psychological disorders, and getting feedback from a low-tech-enabled population, etc. On a random test sample, our model correctly predicts an emotion with a state-of-the-art 92% accuracy.
翻译:设计可靠的言语情感识别系统是一项复杂的任务,这不可避免地需要足够的培训数据。 目前,只有几种语言,包括英语、德语和意大利语,才能提供如此广泛的数据集。 在本文中,我们展示了SEMOUR,这是第一个印有乌尔都语情感标记的言语数据库,以设计乌尔都语语音识别系统。我们的性别平衡数据集包含由8个专业行为者记录的15 040个独特的案例,这些案例引出了一个综合复杂的脚本。该数据集是音调平衡的,可靠地展示了以实验中人手之间高度一致得分为标志的各种情绪。我们还在数据库中提供了各种基线言语情绪预测分数,可用于个人化机器人助理、心理障碍诊断和从低技术能力人群获得反馈等各种应用。在随机测试样本中,我们的模型正确地预测了一种以最先进的92%准确度表示的情绪。