Real-world object detection models should be cheap and accurate. Knowledge distillation (KD) can boost the accuracy of a small, cheap detection model by leveraging useful information from a larger teacher model. However, a key challenge is identifying the most informative features produced by the teacher for distillation. In this work, we show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance. Based on this, we propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher and yields considerable gains in performance over many existing KD baselines. In addition, we propose an adaptive weighting scheme over the key regions to smooth out their influence and achieve even better performance. Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures. Specifically, on the COCO dataset, our method achieves between +3.1% and +4.6% AP improvement using ResNet-101 and ResNet-50 as the teacher and student backbones, respectively. On the CrowdHuman dataset, we achieve +3.2% and +2.0% improvements in MR and AP, also using these backbones. Our code is available at https://github.com/ChenhongyiYang/PGD.
翻译:真实世界天体检测模型应该是廉价和准确的。 知识蒸馏( KD) 能够通过利用更大的教师模型提供的有用信息来提高小型廉价探测模型的准确性。 然而,一个关键的挑战是如何确定教师为蒸馏而创造的信息最丰富的特点。 在这项工作中,我们显示,地面真相捆绑盒中只有很小一部分特征是教师高级一阶段检测性能的原因。 在此基础上,我们提议预测- 指导蒸馏( GPGD),它侧重于对教师的这些关键预测区域进行蒸馏,并在现有的许多KD基线上取得显著的绩效收益。 此外,我们提议在关键区域实施一个适应性加权计划,以缓解教师的影响并取得更好的业绩。 我们提议的方法在各种先进的一阶段检测性能中,比目前最先进的KD基线要强。 具体来说,在COCO数据集上,我们的方法利用ResNet-101和ResNet-50作为教师和学生的骨干骨干,在HR+MRM2. 上,我们的方法实现了这些人权源中的改进。