To train machine learning algorithms to predict emotional expressions in terms of arousal and valence, annotated datasets are needed. However, as different people perceive others' emotional expressions differently, their annotations are per se subjective. For this, annotations are typically collected from multiple annotators and averaged to obtain ground-truth labels. However, when exclusively trained on this averaged ground-truth, the trained network is agnostic to the inherent subjectivity in emotional expressions. In this work, we therefore propose an end-to-end Bayesian neural network capable of being trained on a distribution of labels to also capture the subjectivity-based label uncertainty. Instead of a Gaussian, we model the label distribution using Student's t-distribution, which also accounts for the number of annotations. We derive the corresponding Kullback-Leibler divergence loss and use it to train an estimator for the distribution of labels, from which the mean and uncertainty can be inferred. We validate the proposed method using two in-the-wild datasets. We show that the proposed t-distribution based approach achieves state-of-the-art uncertainty modeling results in speech emotion recognition, and also consistent results in cross-corpora evaluations. Furthermore, analyses reveal that the advantage of a t-distribution over a Gaussian grows with increasing inter-annotator correlation and a decreasing number of annotators.
翻译:培训机器学习算法,以刺激和价值的方式预测情绪表达方式,需要附加说明的数据集。然而,由于不同的人对他人的情感表达方式有不同的看法,因此他们的说明本身是主观的。为此,通常从多个注解者那里收集注解,并平均收集地面真相标签。然而,如果专门就这种平均地面真相进行训练,经过培训的网络对情感表达方式的内在主观性是不可知的。因此,我们提议建立一个端到端的贝耶斯神经网络,能够接受分配标签的培训,从而也捕捉基于主题的标签不确定性。我们用学生的图示来模拟标签分配方式,这也算出说明的数量。我们从中得出相应的 Kullback-Lebeller差异损失,并用它来训练一个用于分配标签分配的估算器,从中可以推断出平均值和不确定性。我们用两个维值数据集来验证拟议的方法。我们用高比值的标签分配模式,我们用学生的图分配方式来模拟标签分配方式的分布结果。我们显示,一个基于不断递增的图像分析结果的模型,一个基于图像分析结果的递增的图像分析结果。