Because of the ambiguous and subjective property of the facial expression recognition (FER) task, the label noise is widely existing in the FER dataset. For this problem, in the training phase, current FER methods often directly predict whether the label of the input image is noised or not, aiming to reduce the contribution of the noised data in training. However, we argue that this kind of method suffers from the low reliability of such noise data decision operation. It makes that some mistakenly abounded clean data are not utilized sufficiently and some mistakenly kept noised data disturbing the model learning process. In this paper, we propose a more reliable noise-label suppression method called ReSup (Reliable label noise Suppression for FER). First, instead of directly predicting noised or not, ReSup makes the noise data decision by modeling the distribution of noise and clean labels simultaneously according to the disagreement between the prediction and the target. Specifically, to achieve optimal distribution modeling, ReSup models the similarity distribution of all samples. To further enhance the reliability of our noise decision results, ReSup uses two networks to jointly achieve noise suppression. Specifically, ReSup utilize the property that two networks are less likely to make the same mistakes, making two networks swap decisions and tending to trust decisions with high agreement. Extensive experiments on three popular benchmarks show that the proposed method significantly outperforms state-of-the-art noisy label FER methods by 3.01% on FERPlus becnmarks. Code: https://github.com/purpleleaves007/FERDenoise
翻译:暂无翻译