Neural networks trained with class-imbalanced data are known to perform poorly on minor classes of scarce training data. Several recent works attribute this to over-fitting to minor classes. In this paper, we provide a novel explanation of this issue. We found that a neural network tends to first under-fit the minor classes by classifying most of their data into the major classes in early training epochs. To correct these wrong predictions, the neural network then must focus on pushing features of minor class data across the decision boundaries between major and minor classes, leading to much larger gradients for features of minor classes. We argue that such an under-fitting phase over-emphasizes the competition between major and minor classes, hinders the neural network from learning the discriminative knowledge that can be generalized to test data, and eventually results in over-fitting. To address this issue, we propose a novel learning strategy to equalize the training progress across classes. We mix features of the major class data with those of other data in a mini-batch, intentionally weakening their features to prevent a neural network from fitting them first. We show that this strategy can largely balance the training accuracy and feature gradients across classes, effectively mitigating the under-fitting then over-fitting problem for minor class data. On several benchmark datasets, our approach achieves the state-of-the-art accuracy, especially for the challenging step-imbalanced cases.
翻译:据了解,经过课堂平衡数据培训的神经网络在稀缺培训数据的细小类别中表现不佳。最近的一些工程将这一点归因于低年级的过度适应。在本文件中,我们提供了对这一问题的新解释。我们发现神经网络将大部分数据划入早期培训时代的主要班级,从而往往首先对小年级进行不适。为了纠正这些错误的预测,神经网络随后必须侧重于将低年级数据的特点推过主要班级和次要班级之间的决策界限,从而导致低年级特征的梯度大得多。我们认为,这种不适的阶段过分强调大班和次要班之间的竞争,阻碍神经网络学习用于测试数据的普遍歧视知识,最终导致不适。为了解决这一问题,我们提出了一个新的学习战略,使各班培训进度均衡。我们把主要班级数据与其他数据相混合成微型档,有意削弱其特征,防止轻年级特征的匹配。我们指出,这一战略可以在很大程度上平衡低年级培训的准确性和特征,从而有效地缓解了我们各个班级的低等级数据。