In this paper, we propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation). Previous studies rely on a strong pretrained teacher to provide instructive knowledge for distillation. However, this could be unavailable in real-world scenarios. Instead, we generate an instructive knowledge by inter-and-intra relation modeling among objects, requiring only student representations and regular labels. In detail, our framework involves sparse label-appearance encoding, inter-object relation adaptation and intra-object knowledge mapping to obtain the instructive knowledge. Modules in LGD are trained end-to-end with student detector and are discarded in inference. Empirically, LGD obtains decent results on various detectors, datasets, and extensive task like instance segmentation. For example in MS-COCO dataset, LGD improves RetinaNet with ResNet-50 under 2x single-scale training from 36.2% to 39.0% mAP (+ 2.8%). For much stronger detectors like FCOS with ResNeXt-101 DCN v2 under 2x multi-scale training (46.1%), LGD achieves 47.9% (+ 1.8%). For pedestrian detection in CrowdHuman dataset, LGD boosts mMR by 2.3% for Faster R-CNN with ResNet-50. Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also with 51% lower training cost beyond inherent student learning.
翻译:在本文中,我们提出首个普通物体检测自我蒸馏框架,称为LGD(Label-Guided自我蒸馏)。以前的研究依靠一个训练有素的教师提供精练的精练的教师,为蒸馏提供指导性知识。然而,在现实世界的情景中,这可能无法找到。相反,我们通过在目标之间建立内部关系模型,只要求学生进行陈述和定期标签,从而产生启发性的知识。详细而言,我们的框架涉及标签显示编码稀疏、项目间关系适应和目标内部知识绘图,以获得有启发性的知识。LGD的模块由学生检测师训练成端对端,并被丢弃在推断中。在各种探测器、数据集和广度任务中都可获得良好的结果。例如,在MS-CO数据集中,LGD改进了Retinnet,在2x的单一规模培训中,从36.2%到39.0%到39.0%的 mAP(+ 2.8%)。对于比REN-NN10 DCN V.2的高级传感器来说,在2x RGDGD Vral-rass 测试中,在2x VAL-ral-rBD 中,在2x中,以47%的升级进行更强得多的培训。