In the real world, the frequency of occurrence of objects is naturally skewed forming long-tail class distributions, which results in poor performance on the statistically rare classes. A promising solution is to mine tail-class examples to balance the training dataset. However, mining tail-class examples is a very challenging task. For instance, most of the otherwise successful uncertainty-based mining approaches struggle due to distortion of class probabilities resulting from skewness in data. In this work, we propose an effective, yet simple, approach to overcome these challenges. Our framework enhances the subdued tail-class activations and, thereafter, uses a one-class data-centric approach to effectively identify tail-class examples. We carry out an exhaustive evaluation of our framework on three datasets spanning over two computer vision tasks. Substantial improvements in the minority-class mining and fine-tuned model's performance strongly corroborate the value of our proposed solution.
翻译:在现实世界中,天体的发生频率自然是偏斜的,形成长尾类分布,导致统计上罕见的类别表现不佳。一个大有希望的解决办法是开采尾级类实例,以平衡培训数据集。然而,采矿尾级实例是一项非常艰巨的任务。例如,由于数据偏差导致的等级概率扭曲而导致的多数基于不确定性的成功采矿方法。在这项工作中,我们提出了克服这些挑战的有效但简单的方法。我们的框架加强了低级尾级启动,随后,我们采用了以数据为中心的单级方法,以有效识别尾级实例。我们对三个数据集的框架进行了详尽的评估,覆盖了两个计算机愿景任务。少数民族类采矿和微调模型的绩效的大幅改进有力地证实了我们拟议解决方案的价值。