In recent years, knowledge distillation has been proved to be an effective solution for model compression. This approach can make lightweight student models acquire the knowledge extracted from cumbersome teacher models. However, previous distillation methods of detection have weak generalization for different detection frameworks and rely heavily on ground truth (GT), ignoring the valuable relation information between instances. Thus, we propose a novel distillation method for detection tasks based on discriminative instances without considering the positive or negative distinguished by GT, which is called general instance distillation (GID). Our approach contains a general instance selection module (GISM) to make full use of feature-based, relation-based and response-based knowledge for distillation. Extensive results demonstrate that the student model achieves significant AP improvement and even outperforms the teacher in various detection frameworks. Specifically, RetinaNet with ResNet-50 achieves 39.1% in mAP with GID on COCO dataset, which surpasses the baseline 36.2% by 2.9%, and even better than the ResNet-101 based teacher model with 38.1% AP.
翻译:近年来,知识蒸馏被证明是模型压缩的有效解决办法。这个方法可以使轻量级学生模型获得从繁琐的教师模型中提取的知识。然而,以前的蒸馏方法对于不同的探测框架缺乏一般化,严重依赖地面真相(GT),忽视了各种实例之间的宝贵关系信息。因此,我们建议采用一种新的蒸馏方法,根据歧视性案例进行探测任务,而不考虑GT的正面或负面区别,GT称之为普通实例蒸馏(GID)。我们的方法包含一个普通实例选择模块(GISM),以充分利用基于地貌的、基于关系的和基于反应的知识进行蒸馏。广泛的结果表明,学生模型在各种探测框架中取得了重大的AP改进,甚至超越了教师的功能。具体地说,ResNet-50的RetinaNet在CO数据集方面实现了39.1%的MAP,GID比基准36.2%高出2.9%,甚至比基于ResNet-101的教师模型(38.1%)的AP。