Although detection with Transformer (DETR) is increasingly popular, its global attention modeling requires an extremely long training period to optimize and achieve promising detection performance. Alternative to existing studies that mainly develop advanced feature or embedding designs to tackle the training issue, we point out that the Region-of-Interest (RoI) based detection refinement can easily help mitigate the difficulty of training for DETR methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper. In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects more accurately. In each processing stage, visual features are extracted as glimpse features from RoIs with enlarged bounding box areas of detection results from the previous stage. Then, a glimpse-based decoder is introduced to provide refined detection results based on both the glimpse features and the attention modeling outputs of the previous stage. In practice, REGO can be easily embedded in representative DETR variants while maintaining their fully end-to-end training and inference pipelines. In particular, REGO helps Deformable DETR achieve 44.8 AP on the MSCOCO dataset with only 36 training epochs, compared with the first DETR and the Deformable DETR that require 500 and 50 epochs to achieve comparable performance, respectively. Experiments also show that REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs. Code is available via https://github.com/zhechen/Deformable-DETR-REGO.
翻译:虽然以变异器(DETR)进行探测越来越受欢迎,但其全球关注模式的建模却要求有一个非常长的培训期,以优化和取得有希望的探测性能。除了主要开发先进特性或嵌入设计以解决培训问题的现有研究外,我们指出,基于“利益区域”的探测改进可以很容易地帮助减轻对变异器方法培训的困难。在此基础上,我们在本文件中引入了一个新的“Revent Glimpse-broder”(REGO) 。特别是,REGO使用一个多阶段的经常性处理结构,帮助DETR逐渐关注地更准确地关注前方物体。在每个处理阶段,视觉特征作为罗伊斯的直观特征被提取,其前一阶段的检测结果有扩大的框框。然后,引入了基于直观特征和前一阶段关注模型产出的精细化检测结果。在实践中,REGO可以很容易地嵌入具有代表性的 DETR 变量,同时保持对前方和后方管道的完全的训练。特别是,REGO 有助于通过前方的可比较性能的DTR 和后方的DREADA 分别进行448的测试。