DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code is released at https://github.com/fundamentalvision/Deformable-DETR.
翻译:DETR最近建议消除物体探测中许多手工设计的部件的需要,同时表现出良好的性能,但是,由于处理图像特征图的变异器关注模块有限,该技术的趋同速度缓慢,而且空间特征分辨率有限;为缓解这些问题,我们建议采用变形式的DETR, 其关注模块只涉及一个参照点周围的一小部分关键取样点;变异式的DETR比DETR(特别是小物体)的性能要好10倍,培训点比DETR(特别是小物体)少10倍。关于COCO基准的广泛实验显示了我们的方法的有效性。代码发布在https://github.com/fundamentamentalvision/Deformable-DETR上。