Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance. Its success cannot be achieved without the re-introduction of multi-scale feature fusion in the encoder. However, the excessively increased tokens in multi-scale features, especially for about 75\% of low-level features, are quite computationally inefficient, which hinders real applications of DETR models. In this paper, we present Lite DETR, a simple yet efficient end-to-end object detection framework that can effectively reduce the GFLOPs of the detection head by 60\% while keeping 99\% of the original performance. Specifically, we design an efficient encoder block to update high-level features (corresponding to small-resolution feature maps) and low-level features (corresponding to large-resolution feature maps) in an interleaved way. In addition, to better fuse cross-scale features, we develop a key-aware deformable attention to predict more reliable attention weights. Comprehensive experiments validate the effectiveness and efficiency of the proposed Lite DETR, and the efficient encoder strategy can generalize well across existing DETR-based models. The code will be available in \url{https://github.com/IDEA-Research/Lite-DETR}.
翻译:最近的脱氧核糖核酸模型(DETR)取得了显著的成绩,如果不在编码器中重新引入多级特征聚合,就无法取得成功。然而,多级特征中过度增加的象征物,特别是约75<unk> 低级特征的象征物,在计算上效率相当低,妨碍了脱氧核酸模型的真正应用。在本文件中,我们介绍了一个简单而高效的端到端天天探测框架,即Lite DETR,它是一个简单而高效的端到端天体检测框架,可以有效地将检测头GFLOP减少60<unk>,同时保留原有的绩效99<unk> 。具体地说,我们设计了一个高效的编码器块,以在互连方式更新高级特征(对小分辨率特征图的对应物)和低级特征(对大分辨率特征图的对应物)。此外,为了改进引信的跨级特征,我们开发了一个关键的觉分解的注意点,以预测更可靠的关注重量。全面实验验证了拟议的脱氧核酸脱氧核糖基的有效性和效率,以及高效的编码战略可以在现有的DTR/Recommas 中通用模型。</s>