In this paper, we propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation). Previous studies rely on a strong pretrained teacher to provide instructive knowledge that could be unavailable in real-world scenarios. Instead, we generate an instructive knowledge based only on student representations and regular labels. Our framework includes sparse label-appearance encoder, inter-object relation adapter and intra-object knowledge mapper that jointly form an implicit teacher at training phase, dynamically dependent on labels and evolving student representations. They are trained end-to-end with detector and discarded in inference. Experimentally, LGD obtains decent results on various detectors, datasets, and extensive tasks like instance segmentation. For example in MS-COCO dataset, LGD improves RetinaNet with ResNet-50 under 2x single-scale training from 36.2% to 39.0% mAP (+ 2.8%). It boosts much stronger detectors like FCOS with ResNeXt-101 DCN v2 under 2x multi-scale training from 46.1% to 47.9% (+ 1.8%). Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also reduces 51% training cost beyond inherent student learning. Codes are available at https://github.com/megvii-research/LGD.
翻译:在本文中,我们提出首个普通物体探测自我蒸馏框架,称为LGD(Label-Guided自我蒸馏)。以前的研究依赖于一个训练有素的教师,以提供在现实世界情景中可能无法获得的启发性知识。相反,我们产生一个仅以学生陈述和常规标签为基础的启发性知识。我们的框架包括:在培训阶段共同形成隐含教师、动态地依赖标签和进化学生代表的自我蒸馏框架(LGD)。他们接受过检测师培训的端对端,并在推断中被抛弃。实验性地,LGD在各种探测器、数据集和诸如实例分割等广泛任务上获得了体面的成果。例如,在MS-CO数据集中,LGDGD将RetinaNet从36.2%到39.0% mAP(+ 2.8%)的单一规模培训改进了ResNet-50,使GDS和Res NeXt-101 DCN v2在2x级的多级培训中得到了更多的检测。在46.1%到47.9%的常规师培训中提高了学习成本。