As different people perceive others' emotional expressions differently, their annotation in terms of arousal and valence are per se subjective. To address this, these emotion annotations are typically collected by multiple annotators and averaged across annotators in order to obtain labels for arousal and valence. However, besides the average, also the uncertainty of a label is of interest, and should also be modeled and predicted for automatic emotion recognition. In the literature, for simplicity, label uncertainty modeling is commonly approached with a Gaussian assumption on the collected annotations. However, as the number of annotators is typically rather small due to resource constraints, we argue that the Gaussian approach is a rather crude assumption. In contrast, in this work we propose to model the label distribution using a Student's t-distribution which allows us to account for the number of annotations available. With this model, we derive the corresponding Kullback-Leibler divergence based loss function and use it to train an estimator for the distribution of emotion labels, from which the mean and uncertainty can be inferred. Through qualitative and quantitative analysis, we show the benefits of the t-distribution over a Gaussian distribution. We validate our proposed method on the AVEC'16 dataset. Results reveal that our t-distribution based approach improves over the Gaussian approach with state-of-the-art uncertainty modeling results in speech-based emotion recognition, along with an optimal and even faster convergence.
翻译:不同的人对他人的情感表达方式有不同的看法,不同的人对他人的情感表达方式有不同的看法,因此,他们用振奋和价值的描述本身是主观的。为了解决这个问题,这些情感说明通常由多个注解者收集,并在注解者中平均收集,以获得振奋和价值的标签。然而,除了一般情况之外,标签的不确定性也是值得注意的,并且应当建模和预测,以便自动认识情绪。在文献中,为了简单起见,标签的不确定性模型通常与所收集的注解者的假设进行对比。然而,由于资源限制,注解者的数量通常相当少,因此我们认为,高估方法是一个相当粗略的假设。与此相反,我们建议用学生的图解说来模拟标签的分布方式,从而使我们能够对可用的图解数量进行计算。我们用基于 Kullback-Lever 模型计算出相应的基于损失的模型功能,并用它来训练一个基于情绪标签分配的估算师,从中可以推断出平均和不确定性的方法。通过定性和定量分析,我们用定量分析,我们用高估方法来验证了我们提出的高估的压结果,我们根据高估的成绩的图表的判分化方法,我们的数据。