In recent years, knowledge distillation (KD) has been widely used as an effective way to derive efficient models. Through imitating a large teacher model, a lightweight student model can achieve comparable performance with more efficiency. However, most existing knowledge distillation methods are focused on classification tasks. Only a limited number of studies have applied knowledge distillation to object detection, especially in time-sensitive autonomous driving scenarios. We propose the Adaptive Instance Distillation (AID) method to selectively impart knowledge from the teacher to the student for improving the performance of knowledge distillation. Unlike previous KD methods that treat all instances equally, our AID can attentively adjust the distillation weights of instances based on the teacher model's prediction loss. We verified the effectiveness of our AID method through experiments on the KITTI and the COCO traffic datasets. The results show that our method improves the performance of existing state-of-the-art attention-guided and non-local distillation methods and achieves better distillation results on both single-stage and two-stage detectors. Compared to the baseline, our AID led to an average of 2.7% and 2.05% mAP increases for single-stage and two-stage detectors, respectively. Furthermore, our AID is also shown to be useful for self-distillation to improve the teacher model's performance.
翻译:近些年来,知识蒸馏(KD)被广泛用作获取高效模型的有效方法。通过模仿大型教师模型,轻量级学生模型能够以更高的效率实现可比业绩。然而,大多数现有的知识蒸馏方法侧重于分类任务。只有为数有限的研究将知识蒸馏用于物体检测,特别是在对时间敏感的自主驱动情景中。我们建议采用适应性程序蒸馏(AID)方法,有选择地将教师的知识传授给学生,以改善知识蒸馏的绩效。与以前同等对待所有情况的KD方法不同,我们的AID可以仔细调整以教师模型预测损失为基础的案例蒸馏权。我们通过在KITTI和CO通信数据集的实验核实了我们AID方法的有效性。结果显示,我们的方法提高了现有最先进的关注引导和非当地模式蒸馏方法的性能,在单级和两阶段检测器上都取得了更好的蒸馏结果。与教师模型的预测损失相比,我们的AID可以仔细调整以教师模型预测损失为基础的案例蒸馏权重量。我们通过在KITTI和CO通信数据集上进行实验,我们的AID的效能平均为2.05和平均的升级,还展示了两个阶段。