There are two paradigms of emotion representation, categorical labeling and dimensional description in continuous space. Therefore, the emotion recognition task can be treated as a classification or regression. The main aim of this study is to investigate the relation between these two representations and propose a classification pipeline that uses only dimensional annotation. The proposed approach contains a regressor model which is trained to predict a vector of continuous values in dimensional representation for given speech audio. The output of this model can be interpreted as an emotional category using a mapping algorithm. We investigated the performances of a combination of three feature extractors, three neural network architectures, and three mapping algorithms on two different corpora. Our study shows the advantages and limitations of the classification via regression approach.
翻译:在连续空间中,有两种情感表达模式、绝对标签和维维描述模式。因此,情感识别任务可以被视为一种分类或回归。本研究的主要目的是调查这两个表达形式之间的关系,并提出一个仅使用维度注解的分类管道。拟议方法包含一个递减模型,经过培训,可以预测某种语言音频在维度表达中的连续值矢量。该模型的输出可以使用映射算法被解释为一种情感类别。我们研究了三个特征提取器、三个神经网络结构以及三个不同体体的绘图算法的结合性能。我们的研究显示了通过回归法进行分类的优点和局限性。