In this paper, we analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER). The current speech emotion recognition models work exceptionally well but fail when then input is multilingual. Moreover, when training such models, the models' performance is suitable only when the training corpus is vast. This availability of a big training corpus is a significant problem when choosing a language that is not much popular or obscure. We attempt to solve this challenge of multilingualism and lack of available data by turning this problem into a few-shot learning problem. We suggest relaxing the assumption that all N classes in an N-way K-shot problem be new and define an N+F way problem where N and F are the number of emotion classes and predefined fixed classes, respectively. We propose this modification to the Model-Agnostic MetaLearning (MAML) algorithm to solve the problem and call this new model F-MAML. This modification performs better than the original MAML and outperforms on EmoFilm dataset.
翻译:在本文中,我们分析了将微小的学习应用到感官感官识别任务(SER)的可行性。当前的言语情感识别模式非常成功,但当输入时失败是多语言的。此外,在培训这些模式时,模型的性能只有在培训内容庞大时才适合。在选择一种不太受欢迎或不太模糊的语言时,提供大型的训练材料是一个大问题。我们试图通过将多语制和缺乏可用数据的问题转化为微小的学习问题来解决这一挑战。我们建议放松以下假设,即N-way K-shot问题中的所有N类都是新的,并定义N+F方法问题,即N和F分别是情感类和预设固定类的数量。我们提议对模型-Agnistic Metalination (MAML) 算法进行这一修改,以解决问题,并称之为新的F-MAML。这种修改比最初的MAML和EmoFilm数据集的外形要好一些。