Model bias triggered by long-tailed data has been widely studied. However, measure based on the number of samples cannot explicate three phenomena simultaneously: (1) Given enough data, the classification performance gain is marginal with additional samples. (2) Classification performance decays precipitously as the number of training samples decreases when there is insufficient data. (3) Model trained on sample-balanced datasets still has different biases for different classes. In this work, we define and quantify the semantic scale of classes, which is used to measure the feature diversity of classes. It is exciting to find experimentally that there is a marginal effect of semantic scale, which perfectly describes the first two phenomena. Further, the quantitative measurement of semantic scale imbalance is proposed, which can accurately reflect model bias on multiple datasets, even on sample-balanced data, revealing a novel perspective for the study of class imbalance. Due to the prevalence of semantic scale imbalance, we propose semantic-scale-balanced learning, including a general loss improvement scheme and a dynamic re-weighting training framework that overcomes the challenge of calculating semantic scales in real-time during iterations. Comprehensive experiments show that dynamic semantic-scale-balanced learning consistently enables the model to perform superiorly on large-scale long-tailed and non-long-tailed natural and medical datasets, which is a good starting point for mitigating the prevalent but unnoticed model bias.
翻译:由长尾数据引发的模型偏差已经进行了广泛的研究,然而,基于样本数量的测量无法同时解释三种现象:(1) 足够数据,分类性能的提高与更多的样本相比是微不足道的。 (2) 分类性能随着培训性样的减少而急剧衰减,因为当数据不足时培训性能的样本数量会减少。(3) 抽样平衡数据集培训模式对不同类别仍具有不同的偏差。在这项工作中,我们定义和量化了等级的语义比例,用于测量各类的特征多样性。实验发现存在精度比例的边际效应,这很好地描述了前两种现象。此外,还提出了语义性比例不平衡的定量测量,这可以准确地反映对多个数据集的模型的模型偏差,即使是抽样平衡的数据。由于语义比例不平衡的普遍程度,我们建议采用语义比例均衡学习,包括总体损失改善计划和动态的重新加权培训框架,克服了在实时模型中计算语义尺度尺度的难度,这很好地描述前两个现象。全面实验能够准确地反映对多种数据集的模型的模型的模型的偏差,从而能够持续地进行平稳地进行平稳地进行长期的自然测算。