The long-tailed recognition (LTR) is the task of learning high-performance classifiers given extremely imbalanced training samples between categories. Most of the existing works address the problem by either enhancing the features of tail classes or re-balancing the classifiers to reduce the inductive bias. In this paper, we try to look into the root cause of the LTR task, i.e., training samples for each class are greatly imbalanced, and propose a straightforward solution. We split the categories into three groups, i.e., many, medium and few, according to the number of training images. The three groups of categories are separately predicted to reduce the difficulty for classification. This idea naturally arises a new problem of how to assign a given sample to the right class groups? We introduce a mutual exclusive modulator which can estimate the probability of an image belonging to each group. Particularly, the modulator consists of a light-weight module and learned with a mutual exclusive objective. Hence, the output probabilities of the modulator encode the data volume clues of the training dataset. They are further utilized as prior information to guide the prediction of the classifier. We conduct extensive experiments on multiple datasets, e.g., ImageNet-LT, Place-LT and iNaturalist 2018 to evaluate the proposed approach. Our method achieves competitive performance compared to the state-of-the-art benchmarks.
翻译:长尾识别(LTR)是在极度不平衡的类别样本之间学习高性能分类器的任务。目前大多数现有方法要么增强尾部类别的特征,要么重新平衡分类器以减少带来的归纳偏差。本文尝试研究LTR问题的根本原因,即每个类别的训练样本存在严重不平衡的情况,并提出了一个直接的解决方案。按训练图像数量将类别分为三组,即训练图像较多、中等和较少的类别分别进行预测,从而减少分类的难度。这个想法自然而然地引起了一个新问题,即如何将给定的样本分配到正确的类别分组中?我们引入了一个互斥调制器,可以估计一张图像属于每组的概率。特别地,调制器由轻量级模块组成,采用相互排斥目标学习。因此,调制器的输出概率编码了训练数据集的数据量线索。它们被进一步用作先验信息引导分类器的预测。我们在多个数据集上进行了广泛的实验,如ImageNet-LT、Place-LT和iNaturalist 2018来评估所提出的方法。我们的方法与最先进的基准相比,取得了竞争性的性能。