Speech emotion recognition (SER) on a single language has achieved remarkable results through deep learning approaches over the last decade. However, cross-lingual SER remains a challenge in real-world applications due to (i) a large difference between the source and target domain distributions, (ii) the availability of few labeled and many unlabeled utterances for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when a few labels from the new language are available. Based on a Convolutional Neural Network (CNN), our method adapts to a new language by exploiting a pseudo-labeling strategy for the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains.
翻译:过去十年来,在一种单一语言上,跨语言语言语言语言语言语音识别(SER)通过深层次的学习方法取得了显著成果,然而,在现实世界应用中,跨语言语言语言识别仍然是一个挑战,因为(一) 源与目标域分布之间的差别很大,(二) 新语言的标签和许多未标的语句数量很少,(二) 新语言的标签和许多未标的语句数量不多。考虑到以前的情况,我们建议采用半超语言学习(SSL)方法,在有新语言的几个标签可用时,跨语言语言的情感识别方法。根据一个革命神经网络(CNN),我们的方法通过对未标的语句使用假标签战略,适应了新语言。特别是,我们研究了使用硬软假标签方法的情况。我们彻底评估了该方法在依赖源和新语言的语群落中的表现,并展示了该方法在属于不同语言菌株的五种语言中的强健性。