Detection Transformer (DETR) relies on One-to-One label assignment, i.e., assigning one ground-truth (gt) object to only one positive object query, for end-to-end object detection and lacks the capability of exploiting multiple positive queries. We present a novel DETR training approach, named {\em Group DETR}, to support multiple positive queries. To be specific, we decouple the positives into multiple independent groups and keep only one positive per gt object in each group. We make simple modifications during training: (i) adopt $K$ groups of object queries; (ii) conduct decoder self-attention on each group of object queries with the same parameters; (iii) perform One-to-One label assignment for each group, leading to $K$ positive object queries for each gt object. In inference, we only use one group of object queries, making no modifications to both architecture and processes. We validate the effectiveness of the proposed approach on DETR variants, including Conditional DETR, DAB-DETR, DN-DETR, and DINO.
翻译:检测变异器(DETR) 依赖于一对一的标签任务, 即: 将一个地面真实(gt) 对象指定为只有一个正对象查询, 用于终端到终端对象的检测, 缺乏利用多个正查询的能力 。 我们提出了一个名为 em Group DETR 的新版 DER 培训方法, 以支持多个正查询 。 具体地说, 我们将正数分解为多个独立组, 在每个组中只保留一个正数对象 。 我们在培训过程中简单修改 :(一) 采用$K$的对象查询组;(二) 对每组具有相同参数的物体查询组进行脱coder 自我保管;(三) 每个组执行一对一标签任务, 导致每个gt对象的正数对象查询 。 我们推断, 我们只使用一组对象查询, 不修改结构和进程。 我们验证了拟议对DETR 变异体, 包括有条件的 DETR、 DAB- DETR、 DN- DETR 和 DI 。