Knowledge distillation has been applied to image classification successfully. However, object detection is much more sophisticated and most knowledge distillation methods have failed on it. In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. If we distill them equally, the uneven differences between feature maps will negatively affect the distillation. Thus, we propose Focal and Global Distillation (FGD). Focal distillation separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels. Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation. As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors. We experiment on various detectors with different backbones and the results show that the student detector achieves excellent mAP improvement. For example, ResNet-50 based RetinaNet, Faster RCNN, RepPoints and Mask RCNN with our distillation method achieve 40.7%, 42.0%, 42.0% and 42.1% mAP on COCO2017, which are 3.3, 3.6, 3.4 and 2.9 higher than the baseline, respectively. Our codes are available at https://github.com/yzd-v/FGD.
翻译:在图像分类中成功地应用了知识蒸馏法。 然而, 对象探测方法要复杂得多, 多数知识蒸馏方法也失败了。 在本文中, 我们指出, 在物体探测中, 教师和学生的特征在不同领域, 特别是在前景和背景方面差异很大。 如果我们同样地将它们蒸馏出来, 地志图之间的差异差异会不利地影响蒸馏。 因此, 我们提议Coint and Global 蒸馏法( FGD) 。 焦点蒸馏法将前景和背景区分开来, 迫使学生关注教师的关键像素和渠道。 全球蒸馏法重建了不同像素之间的关系, 并将它从教师之间转移给学生, 弥补在焦点蒸馏过程中缺少的全球信息。 由于我们的方法只需计算地貌图上的损失, 地貌图上的差异将会对各种探测器造成负面影响。 我们用不同的骨架对各种探测器进行实验, 结果显示学生探测器的 mAP改进非常优异。 例如, ResNet- 50 以 Retinnet 为基础, RCNN, Repments/ RCNN, 和 RCN RCNNN, 。 。 在 31.0 31.0 31.0 中, 我们的提炼法中, 我们的基线为40. 0 。