In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors. We show that the predicted boxes from teacher detectors are effective medium to transfer knowledge of teacher detectors, which could be either RCNN-based or DETR-based detectors, to train a more accurate and robust DETR model. This new training scheme can easily incorporate the predicted boxes from multiple teacher detectors, each of which provides parallel supervisions to the student DETR. Our strategy introduces no additional parameters and adds negligible computational cost to the original detector during training. During inference, Teach-DETR brings zero additional overhead and maintains the merit of requiring no non-maximum suppression. Extensive experiments show that our method leads to consistent improvement for various DETR-based detectors. Specifically, we improve the state-of-the-art detector DINO with Swin-Large backbone, 4 scales of feature maps and 36-epoch training schedule, from 57.8% to 58.9% in terms of mean average precision on MSCOCO 2017 validation set. Code will be available at https://github.com/LeonHLJ/Teach-DETR.
翻译:在本文中,我们提出了一个新颖的培训计划,即“教育-DETR”,目的是从多功能教师探测器中学习更好的DETR探测器;我们表明,教师探测器的预测箱是转让师探测器知识的有效媒介,教师探测器可以是RCNN的探测器,也可以是DETR的探测器,以培训更加准确和有力的DETR模型;这个新的培训计划可以很容易地纳入来自多个教师探测器的预测箱,每个探测器都为学生DETR提供平行监督。我们的战略没有引入额外的参数,而且在培训期间给原始探测器增加微不足道的计算成本。在推断中,Teach-DETR带来零额外间接费用,并保持不需要非最大抑制的优点。广泛的实验表明,我们的方法可以使各种DETR探测器不断改进。具体地说,我们用Swin-Large的骨架、4个地貌地图和36个小的训练时间表改进了最新的DINO探测器,平均精确度从57.8%到58.9%的MSCO 2017年确认数据集的平均精确度。