Knowledge distillation learns a lightweight student model that mimics a cumbersome teacher. Existing methods regard the knowledge as the feature of each instance or their relations, which is the instance-level knowledge only from the teacher model, i.e., the local knowledge. However, the empirical studies show that the local knowledge is much noisy in object detection tasks, especially on the blurred, occluded, or small instances. Thus, a more intrinsic approach is to measure the representations of instances w.r.t. a group of common basis vectors in the two feature spaces of the teacher and the student detectors, i.e., global knowledge. Then, the distilling algorithm can be applied as space alignment. To this end, a novel prototype generation module (PGM) is proposed to find the common basis vectors, dubbed prototypes, in the two feature spaces. Then, a robust distilling module (RDM) is applied to construct the global knowledge based on the prototypes and filtrate noisy global and local knowledge by measuring the discrepancy of the representations in two feature spaces. Experiments with Faster-RCNN and RetinaNet on PASCAL and COCO datasets show that our method achieves the best performance for distilling object detectors with various backbones, which even surpasses the performance of the teacher model. We also show that the existing methods can be easily combined with global knowledge and obtain further improvement. Code is available: https://github.com/hikvision-research/DAVAR-Lab-ML.
翻译:现有方法将知识视为每个实例或学生探测器的特征,即全球知识。然后,蒸馏算法可以用作空间校准。为此,提议一个新型原型生成模块(PGM)在两个地貌空间寻找通用基矢量,在两个地貌空间寻找被封的原型。然后,采用一个强大的蒸馏模块(RDM)来构建基于原型的全球知识,并通过测量两个地貌空间的显示差异来过滤全球和地方的噪音知识。 与更快的RCNNN和RetinaNet进行实验,然后,在PASAL和CO上展示最佳的性能测试方法。