The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN. However, DETR suffers from its slow convergence. Training DETR from scratch needs 500 epochs to achieve a high accuracy. To accelerate its convergence, we propose a simple yet effective scheme for improving the DETR framework, namely Spatially Modulated Co-Attention (SMCA) mechanism. The core idea of SMCA is to conduct location-aware co-attention in DETR by constraining co-attention responses to be high near initially estimated bounding box locations. Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder while keeping other operations in DETR unchanged. Furthermore, by integrating multi-head and scale-selection attention designs into SMCA, our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone (45.6 mAP at 108 epochs vs. 43.3 mAP at 500 epochs). We perform extensive ablation studies on COCO dataset to validate SMCA. Code is released at https://github.com/gaopengcuhk/SMCA-DETR .
翻译:最近提出的探测变异器(DETR)模型成功地将变异器应用于物体探测,并用两个阶段的物体探测框架(如Feair-RCNN)实现类似性能,如Seager-RCNN。然而,DETR的趋同速度缓慢。从零到零培训DETR需要500个小步才能达到高精度。为了加速其趋同速度,我们提出了一个简单而有效的计划来改进DETR框架,即空间移动的共振(SMCA)机制。SMCA的核心思想是,通过限制共同注意反应接近最初估计的捆绑框位置,使DER的一致性能达到可比水平。我们提议的SMCA提高了DETR的趋同速度,取代了Decoder的最初共同注意机制,同时使DTR的其他操作保持不变。此外,我们完全成熟的SMCA(SMCA)的功能可以比DETR(45.6 mAP 108 Eepochs vs. 43.3 mAP) 在500epto-Ambos 进行数据化的CO/CA/CARCMAS) 进行广泛的CO/SMACSMeleval化。我们在SMAC/MACSMSDUBSDUDUDUDUDS/SMAL