The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators and inter-class similarity. However, the recent deep networks have strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. To handle noisy annotations, we propose a dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training. Specifically, DNFER is based on supervised training using selected clean samples and unsupervised consistent training using all the samples. During training, the mean posterior class probabilities of each mini-batch is used as dynamic class-specific threshold to select the clean samples for supervised training. This threshold is independent of noise rate and does not need any clean data unlike other methods. In addition, to learn from all samples, the posterior distributions between weakly-augmented image and strongly-augmented image are aligned using an unsupervised consistency loss. We demonstrate the robustness of DNFER on both synthetic as well as on real noisy annotated FER datasets like RAFDB, FERPlus, SFEW and AffectNet.
翻译:真实世界面部表现识别(FER)数据集由于众包、表达方式模糊不清、批注员主观性以及阶级间相似性等原因,受到声音不清的描述;然而,最近深层次的网络具有很强的能力,可以记住噪音的注释,导致出现腐败的特征嵌入和简略化不力;为了处理噪音的注释,我们建议一个动态的FER学习框架(DNFER),在培训期间,根据动态的级别特定阈值选择干净的样本。具体地说,DNFER基于监督培训,使用选定的清洁样本进行监督培训,使用所有样本进行不受监督的一致培训。在培训期间,每个微型批次的平均前层阶级概率作为动态的临界值,用于选择受监督的培训的清洁样品。这一临界值独立于噪音率,不需要与其它方法不同的任何清洁数据。除了从所有样本中了解微弱图像和强度图像之间的后部分布,使用未超强的一致性一致性损失。我们展示了DFER在合成和真实的FERFDF数据上坚固性,例如ERFDFDB、真实的节压数据。