AffectNet is one of the most popular resources for facial expression recognition (FER) on relatively unconstrained in-the-wild images. Given that images were annotated by only one annotator with limited consistency checks on the data, however, label quality and consistency may be limited. Here, we take a similar approach to a study that re-labeled another, smaller dataset (FER2013) with crowd-based annotations, and report results from a re-labeling and re-annotation of a subset of difficult AffectNet faces with 13 people on both expression label, and valence and arousal ratings. Our results show that human labels overall have medium to good consistency, whereas human ratings especially for valence are in excellent agreement. Importantly, however, crowd-based labels are significantly shifting towards neutral and happy categories and crowd-based affective ratings form a consistent pattern different from the original ratings. ResNets fully trained on the original AffectNet dataset do not predict human voting patterns, but when weakly-trained do so much better, particularly for valence. Our results have important ramifications for label quality in affective computing.
翻译:AfectNet是脸部表达识别最受欢迎的资源之一,用于相对不受限制的视觉图像。鉴于图像仅由对数据进行有限一致性检查的一位注解员加注,但标签质量和一致性可能有限。在这里,我们采取类似方法进行一项研究,重新标注另一个基于人群的说明的较小数据集(FeffectNet),以及重新标注和重新标注一个困难的AfectNet脸谱的结果,13人同时标注表达标签,以及价值和令人振奋的评分。我们的结果显示,人类标签总体具有中等至良好一致性,而人类评级,特别是价值的评分则非常一致。但重要的是,基于人群的标签正在大幅转向中性和快乐的类别,基于人群的影响力评分形成与原始评分不同的一致模式。在最初的AfectNet数据集上经过充分培训的ResNet并不预测人的投票模式,但当受微调时,我们的结果对影响计算中的标签质量有着重要影响。