Unlike the case when using a balanced training dataset, the per-class recall (i.e., accuracy) of neural networks trained with an imbalanced dataset are known to vary a lot from category to category. The convention in long-tailed recognition is to manually split all categories into three subsets and report the average accuracy within each subset. We argue that under such an evaluation setting, some categories are inevitably sacrificed. On one hand, focusing on the average accuracy on a balanced test set incurs little penalty even if some worst performing categories have zero accuracy. On the other hand, classes in the "Few" subset do not necessarily perform worse than those in the "Many" or "Medium" subsets. We therefore advocate to focus more on improving the lowest recall among all categories and the harmonic mean of all recall values. Specifically, we propose a simple plug-in method that is applicable to a wide range of methods. By simply re-training the classifier of an existing pre-trained model with our proposed loss function and using an optional ensemble trick that combines the predictions of the two classifiers, we achieve a more uniform distribution of recall values across categories, which leads to a higher harmonic mean accuracy while the (arithmetic) average accuracy is still high. The effectiveness of our method is justified on widely used benchmark datasets.
翻译:在使用平衡的培训数据集时,与使用平衡的培训数据集的情况不同,受过不平衡数据集训练的神经网络的单级回调(即精度)在类别和类别之间差别很大。长期确认的公约是手工将所有类别分成三个子集,并报告每个子集的平均精度。我们争辩说,在这种评价设置下,某些类别不可避免地会牺牲。一方面,即使某些表现最差的类别没有准确性,但侧重于平衡测试数据集的平均精度却很少受到处罚。另一方面,“Few”子类不一定比“Many”或“Medium”子集的更差。因此,我们主张更加注重改进所有类别中最低的回调,以及所有回调值的相近度。具体地说,我们建议一种简单的插插入法,适用于广泛的方法。简单地将现有的预先培训模式的精度与我们拟议的损失函数重新训练,并且使用一种可选的混合两种分类预测的组合戏法。我们所使用的平均精确性数据在高的类别上达到一个比较一致的精确度的精确度。</s>