Speech Emotion Recognition (SER) is of great importance in Human-Computer Interaction (HCI), as it provides a deeper understanding of the situation and results in better interaction. In recent years, various machine learning and deep learning algorithms have been developed to improve SER techniques. Recognition of emotions depends on the type of expression that varies between different languages. In this article, to further study this important factor in Farsi, we examine various deep learning techniques on the SheEMO dataset. Using signal features in low- and high-level descriptions and different deep networks and machine learning techniques, Unweighted Average Recall (UAR) of 65.20 is achieved with an accuracy of 78.29.
翻译:情感言语认知(SER)在人类-计算机互动(HCI)中非常重要,因为它有助于更深入地了解形势和更好的互动结果,近年来,为了改进SER技术,开发了各种机器学习和深层学习算法,对情感的识别取决于不同语言的表达方式,在本条中,为了进一步研究波斯语中的这一重要因素,我们研究了SheEMO数据集的各种深层学习技术,利用低层和高层描述中的信号特征以及不同的深层网络和机器学习技术,实现了65.20的未加权平均回调(UAR),精确度为78.29。