As a general model compression paradigm, feature-based knowledge distillation allows the student model to learn expressive features from the teacher counterpart. In this paper, we mainly focus on designing an effective feature-distillation framework and propose a spatial-channel adaptive masked distillation (AMD) network for object detection. More specifically, in order to accurately reconstruct important feature regions, we first perform attention-guided feature masking on the feature map of the student network, such that we can identify the important features via spatially adaptive feature masking instead of random masking in the previous methods. In addition, we employ a simple and efficient module to allow the student network channel to be adaptive, improving its model capability in object perception and detection. In contrast to the previous methods, more crucial object-aware features can be reconstructed and learned from the proposed network, which is conducive to accurate object detection. The empirical experiments demonstrate the superiority of our method: with the help of our proposed distillation method, the student networks report 41.3%, 42.4%, and 42.7% mAP scores when RetinaNet, Cascade Mask-RCNN and RepPoints are respectively used as the teacher framework for object detection, which outperforms the previous state-of-the-art distillation methods including FGD and MGD.
翻译:作为一般模型压缩模式,基于地貌的知识蒸馏法使学生模型能够从教师对应方学习表达特征。在本文中,我们主要侧重于设计一个有效的地貌蒸馏框架,并提出用于物体探测的空间通道适应性掩码蒸馏(AMD)网络。更具体地说,为了准确重建重要的地貌区域,我们首先在学生网络的地貌图上进行关注引导掩码,这样我们就可以通过空间适应性特征遮罩而不是以往方法中的随机遮罩来确定重要特征。此外,我们使用一个简单而高效的模块,使学生网络频道能够适应性,提高学生网络在物体感知和探测方面的示范能力。与以往的方法不同,更关键的物体觉悟化(AMD)网络特征可以重建并从拟议的网络中学习,这有助于准确的物体探测。实验实验显示了我们方法的优越性:在我们提议的蒸馏方法的帮助下,学生网络报告41.3%、42.4%和42.7%的 mAP分数,在RetinaNet、Cassidead Mask-RCNN和Repress-GDF-GD 分别用来作为教师前制模的测试框架。