Knowledge distillation is a widely used paradigm for inheriting information from a complicated teacher network to a compact student network and maintaining the strong performance. Different from image classification, object detectors are much more sophisticated with multiple loss functions in which features that semantic information rely on are tangled. In this paper, we point out that the information of features derived from regions excluding objects are also essential for distilling the student detector, which is usually ignored in existing approaches. In addition, we elucidate that features from different regions should be assigned with different importance during distillation. To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector. Specifically, two levels of decoupled features will be processed for embedding useful information into the student, i.e., decoupled features from neck and decoupled proposals from classification head. Extensive experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection. For example, DeFeat improves ResNet50 based Faster R-CNN from 37.4% to 40.9% mAP, and improves ResNet50 based RetinaNet from 36.5% to 39.7% mAP on COCO benchmark. Our implementation is available at https://github.com/ggjy/DeFeat.pytorch.
翻译:知识蒸馏是继承从复杂的教师网络到紧凑的学生网络的信息并保持高性能的一种广泛使用的范例。 与图像分类不同, 物体探测器比图像分类要复杂得多, 具有多重丢失功能, 其特征是语义信息所依赖的。 在本文中, 我们指出, 来自排除物体的区域的特征信息对于蒸馏学生探测器也是必不可少的, 而现有方法通常忽略了这些信息。 此外, 我们阐明, 在蒸馏过程中, 不同地区的特征应该被赋予不同的重要性。 为此, 我们通过分解功能( DeFeat) 展示一种新颖的蒸馏算法( DeFeat), 以学习更好的学生探测器。 具体来说, 将处理两个层次的分解特性, 将有用信息嵌入学生体内, 即, 从颈部解析出的特性和从分类头部解析出的建议。 对不同主干线的各种探测器进行的广泛实验显示, 拟议的DeFeat 能够在蒸馏过程中超越用于物体探测的状态方法。 例如, DeFeats degive of the art prest distration distration dest destation ass, for rebution (Defeat) ex- nem.