One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) method to remove duplicate detections. This end-to-end signature is important for the versatility of DETR, and it has been generalized to a wide range of visual problems, including instance/semantic segmentation, human pose estimation, and point cloud/multi-view-images based detection, etc. However, we note that because there are too few queries assigned as positive samples, the one-to-one set matching significantly reduces the training efficiency of positive samples. This paper proposes a simple yet effective method based on a hybrid matching scheme that combines the original one-to-one matching branch with auxiliary queries that use one-to-many matching loss during training. This hybrid strategy has been shown to significantly improve training efficiency and improve accuracy. In inference, only the original one-to-one match branch is used, thus maintaining the end-to-end merit and the same inference efficiency of DETR. The method is named $\mathcal{H}$-DETR, and it shows that a wide range of representative DETR methods can be consistently improved across a wide range of visual tasks, including Deformable-DETR, 3DETR/PETRv2, PETR, and TransTrack, among others. Code will be available at: https://github.com/HDETR
翻译:一对一的匹配是DETR建立其端对端能力的关键设计,因此,对象检测不需要手工制作NMS(非最大抑制)方法来消除重复检测。 这种端对端签名对于DETR的多功能性很重要,并且它被广泛推广到广泛的视觉问题,包括实例/mantic 分割、人造图象估计和基于检测的点云/多视图像。 然而,我们注意到,由于指定为正样的查询太少,一对一的匹配组大大降低了正样的训练效率。本文提出了一个基于混合匹配方案的简单而有效的方法,将最初的一对一匹配分支与在培训期间使用一对一匹配损失的辅助查询结合起来。这种混合战略已经显示大大提高了培训效率和准确性。根据推论,只使用了最初的一对一匹配分支,从而保持了DTRTR的端端端端端端功率和相同的引用效率。 DETR的一对齐调方法将持续地显示一个具有代表性的值范围,包括 $=范围任务,DRRV 。