In recent years, knowledge distillation (KD) has been widely used to derive efficient models. Through imitating a large teacher model, a lightweight student model can achieve comparable performance with more efficiency. However, most existing knowledge distillation methods are focused on classification tasks. Only a limited number of studies have applied knowledge distillation to object detection, especially in time-sensitive autonomous driving scenarios. In this paper, we propose Adaptive Instance Distillation (AID) to selectively impart teacher's knowledge to the student to improve the performance of knowledge distillation. Unlike previous KD methods that treat all instances equally, our AID can attentively adjust the distillation weights of instances based on the teacher model's prediction loss. We verified the effectiveness of our AID method through experiments on the KITTI and the COCO traffic datasets. The results show that our method improves the performance of state-of-the-art attention-guided and non-local distillation methods and achieves better distillation results on both single-stage and two-stage detectors. Compared to the baseline, our AID led to an average of 2.7% and 2.1% mAP increases for single-stage and two-stage detectors, respectively. Furthermore, our AID is also shown to be useful for self-distillation to improve the teacher model's performance.
翻译:近年来,知识蒸馏(KD)被广泛应用于导出高效的模型。通过模仿一个大型的教师模型,轻量的学生模型可以在更高效的情况下达到可比较的性能。然而,大多数现有的知识蒸馏方法都集中于分类任务。在时间敏感的自主驾驶场景中,只有有限的研究将知识蒸馏应用于目标检测。本文提出了自适应实例蒸馏(AID)来选择性地将教师的知识传递给学生,从而提高知识蒸馏的性能。与以前处理所有实例的KD方法不同,我们的AID可以根据教师模型的预测损失来注意地调整实例的蒸馏权重。通过在KITTI和COCO交通数据集上进行实验,我们验证了我们的AID方法的有效性。结果表明,我们的方法提高了最先进的注意力引导和非局部蒸馏方法的性能,并在单阶段和两阶段检测器上实现了更好的蒸馏效果。与基线相比,我们的AID导致了单阶段和两阶段检测器的平均mAP分别增加了2.7%和2.1%。此外,我们的AID也被证明对于自蒸馏以提高教师模型的性能是有用的。