Knowledge distillation constitutes a simple yet effective way to improve the performance of a compact student network by exploiting the knowledge of a more powerful teacher. Nevertheless, the knowledge distillation literature remains limited to the scenario where the student and the teacher tackle the same task. Here, we investigate the problem of transferring knowledge not only across architectures but also across tasks. To this end, we study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework. In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance. Our experiments on several detectors with different backbones demonstrate the effectiveness of our approach, allowing us to outperform the state-of-the-art detector-to-detector distillation methods.
翻译:知识蒸馏是一种简单而有效的方法,通过利用一个更强大的教师的知识来改善一个紧凑的学生网络的绩效。 然而,知识蒸馏文献仍然局限于学生和教师应对同样任务的情况。 在这里,我们不仅调查跨建筑和跨任务的知识转让问题。 为此,我们研究物体探测案例,而不是遵循标准的检测器到检测器蒸馏方法,而是引入一个分类器到检测器的知识传输框架。特别是,我们提议利用分类师的战略来提高检测器的识别准确性和本地化性。我们对不同骨干的若干探测器的实验显示了我们的方法的有效性,使我们能够超越最先进的检测器到检测器的蒸馏方法。