Deep models trained on long-tailed datasets exhibit unsatisfactory performance on tail classes. Existing methods usually modify the classification loss to increase the learning focus on tail classes, which unexpectedly sacrifice the performance on head classes. In fact, this scheme leads to a contradiction between the two goals of long-tailed learning, i.e., learning generalizable representations and facilitating learning for tail classes. In this work, we explore knowledge distillation in long-tailed scenarios and propose a novel distillation framework, named Balanced Knowledge Distillation (BKD), to disentangle the contradiction between the two goals and achieve both simultaneously. Specifically, given a vanilla teacher model, we train the student model by minimizing the combination of an instance-balanced classification loss and a class-balanced distillation loss. The former benefits from the sample diversity and learns generalizable representation, while the latter considers the class priors and facilitates learning mainly for tail classes. The student model trained with BKD obtains significant performance gain even compared with its teacher model. We conduct extensive experiments on several long-tailed benchmark datasets and demonstrate that the proposed BKD is an effective knowledge distillation framework in long-tailed scenarios, as well as a new state-of-the-art method for long-tailed learning. Code is available at https://github.com/EricZsy/BalancedKnowledgeDistillation .
翻译:在长尾类数据集方面经过深层次培训的深层模型显示尾骨类的表现不尽如人意。现有方法通常修改分类损失,以增加尾骨类的学习重点,这出乎意料地牺牲了头骨类的学习。事实上,这一计划导致长尾类学习的两个目标相互矛盾,即学习可概括的表述方式和便利尾尾骨类的学习。在这项工作中,我们探索长尾类情景中的知识蒸馏,并提出一个新的蒸馏框架,名为平衡知识蒸馏(BKD),以混淆这两个目标之间的矛盾,并同时实现两者。具体地说,鉴于香草师范教师模式,我们培训学生模式的方式是最大限度地减少试样平衡分类损失和类平衡蒸馏损失的组合。以前从抽样多样性中获益并学习可概括性表述方式,而后者则考虑班级前的学习主要为尾骨类学习提供便利。与BKD培训的学生模型即使与其教师模型相比也取得了显著的业绩增益。我们在几个长期的基准数据集上进行广泛的实验,并表明拟议的BKD是长期学习法的长尾/级标准框架。