Current emotion detection classifiers predict discrete emotions. However, literature in psychology has documented that compound and ambiguous facial expressions are often evoked by humans. As a stride towards development of machine learning models that more accurately reflect compound and ambiguous emotions, we replace traditional one-hot encoded label representations with a crowd's distribution of labels. We center our study on the Child Affective Facial Expression (CAFE) dataset, a gold standard dataset of pediatric facial expressions which includes 100 human labels per image. We first acquire crowdsourced labels for 207 emotions from CAFE and demonstrate that the consensus labels from the crowd tend to match the consensus from the original CAFE raters, validating the utility of crowdsourcing. We then train two versions of a ResNet-152 classifier on CAFE images with two types of labels (1) traditional one-hot encoding and (2) vector labels representing the crowd distribution of responses. We compare the resulting output distributions of the two classifiers. While the traditional F1-score for the one-hot encoding classifier is much higher (94.33% vs. 78.68%), the output probability vector of the crowd-trained classifier much more closely resembles the distribution of human labels (t=3.2827, p=0.0014). For many applications of affective computing, reporting an emotion probability distribution that more closely resembles human interpretation can be more important than traditional machine learning metrics. This work is a first step for engineers of interactive systems to account for machine learning cases with ambiguous classes and we hope it will generate a discussion about machine learning with ambiguous labels and leveraging crowdsourcing as a potential solution.
翻译:然而,心理学文献记录了人类常常会引用复合和模糊的面部表达方式。作为向发展更准确地反映复合和模糊情绪的机器学习模型迈进的一步,我们用人群分布标签来取代传统的单热编码标签显示器。我们把研究的焦点放在儿童动动动反动表情(CAFE)数据集上,这是一个包含每个图像100个人类标签的儿科面部表达式的金标准数据集。我们首先从CAFE中获取207个情感的群落源标签,并表明人群的共识标签往往与原CAFE评级器的共识相匹配,从而验证了众包的效用。然后我们用两种标签来培训两个版本的 ResNet-152 的标签显示器。 我们的研究中心在儿童动动动反动表情表达(CAFAFEE) 数据集(CACAFEFE) 数据库的金质标准数据集数据集中,我们比较了两个分类的产值分布方式。我们把一热感化分类的F1-芯标与一阶分级分析器分析器分类的阶梯级相比要高得多(94.33%) 讨论讨论比78.68的机机算的机算的机算的概率分析过程要多。