Despite the success of Knowledge Distillation (KD) on image classification, it is still challenging to apply KD on object detection due to the difficulty in locating knowledge. In this paper, we propose an instance-conditional distillation framework to find desired knowledge. To locate knowledge of each instance, we use observed instances as condition information and formulate the retrieval process as an instance-conditional decoding process. Specifically, information of each instance that specifies a condition is encoded as query, and teacher's information is presented as key, we use the attention between query and key to measure the correlation, formulated by the transformer decoder. To guide this module, we further introduce an auxiliary task that directs to instance localization and identification, which are fundamental for detection. Extensive experiments demonstrate the efficacy of our method: we observe impressive improvements under various settings. Notably, we boost RetinaNet with ResNet-50 backbone from 37.4 to 40.7 mAP (+3.3) under 1x schedule, that even surpasses the teacher (40.4 mAP) with ResNet-101 backbone under 3x schedule. Code will be released soon.
翻译:尽管图像分类方面的知识蒸馏(KD)取得了成功,但由于难以找到知识,在物体探测上应用KD仍具有挑战性。在本文中,我们提出一个实例条件蒸馏框架,以寻找所需的知识。为了查找每个实例的知识,我们使用观测到的事例作为条件信息,并拟订检索过程作为实例条件解码过程。具体地说,每个实例中指定条件的信息被编码为查询,并将教师的信息作为关键,我们利用查询和关键之间的注意力来测量变压器解码器开发的关联。为了指导这一模块,我们进一步引入一个辅助任务,指导对检测至关重要的定位和识别实例。广泛的实验展示了我们方法的功效:我们观察了各种环境中令人印象深刻的改进情况。值得注意的是,我们根据1x时间表将ResNet-50骨干从37.4到40.7mAP(+3.3)提升到ResNet-50,甚至超过ResNet-101骨干的教师(40.4mAP),将在3x时间表下很快发布。