Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.
翻译:检测变异器(DETR) 通过在培训期间使用一对一双对一对一对一对一对一对一对一对一对一对一对一对一的匹配将查询直接转换为独特的物体。 最近,这些模型已经超越了对COCOCO的传统探测器, 且具有不可否认的优雅性。 但是,它们与包括模型架构和培训时间表在内的多种设计的传统探测器不同,因此,对一对一匹配的有效性没有完全理解。 在这项工作中,我们对DETR的一对一匹配匈牙利匹配和在传统探测器中以非最大监督(NMS)进行一对一标签分配进行严格比较。 令人惊讶的是,我们观察到NMS的一对一对一任务始终高于同一设置下的标准一对一匹配, 在同一设置下大大超过2.5 mAP。 我们用来用传统的 IOU 标签任务培训变异式- DETR 的探测器在12 oz (x 时间表) 中实现了50.2 CO mAP, 和ResNet50 中的主干线, 超越了所有现有的传统或变异器探测器。 在多个数据集、 时间表上, 我们一直在展示了它们现有的变压式的检测/ 。