Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions. An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively. In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting. Our main contribution is a new training method, referred to as Class-Balanced Distillation (CBD), that leverages knowledge distillation to enhance feature representations. CBD allows the feature representation to evolve in the second training stage, guided by the teacher learned in the first stage. The second stage uses class-balanced sampling, in order to focus on under-represented classes. This framework can naturally accommodate the usage of multiple teachers, unlocking the information from an ensemble of models to enhance recognition capabilities. Our experiments show that the proposed technique consistently outperforms the state of the art on long-tailed recognition benchmarks such as ImageNet-LT, iNaturalist17 and iNaturalist18.
翻译:现实世界图像的特点是每类图像数量严重失衡,导致长期尾声分布。长尾目视觉识别的有效和简单方法是分别学习特征表现和分类器,分别进行实例和类别平衡的抽样。在这项工作中,我们引入了一个新的框架,提出关键意见,即通过实例抽样学习的特征表现在长尾目环境中远非最理想。我们的主要贡献是一种新的培训方法,即所谓的“类-平衡蒸馏(CBD)”,利用知识蒸馏来增强特征表现。《生物多样性公约》允许在第二阶段培训阶段,在第一阶段学习的教师的指导下,进行特征表现。第二阶段使用类别平衡的抽样,以侧重于代表性不足的班级。该框架自然可以容纳多位教师的使用,解开从一系列模型中获取的信息,以提高认知能力。我们的实验表明,拟议的技术始终超越了长期精确识别基准,如图像网络-远程识别、饱和表17 和inaturallist18。