Multi-scale features have been proven highly effective for object detection, and most ConvNet-based object detectors adopt Feature Pyramid Network (FPN) as a basic component for exploiting multi-scale features. However, for the recently proposed Transformer-based object detectors, directly incorporating multi-scale features leads to prohibitive computational overhead due to the high complexity of the attention mechanism for processing high-resolution features. This paper presents Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables the efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with slight computational overhead. Project page: https://github.com/ZhangGongjie/IMFA.
翻译:事实证明,多尺度的功能对于物体探测非常有效,大多数基于ConvNet的物体探测器都采用地貌型金字塔网络(FPN)作为利用多尺度特征的基本组成部分。然而,对于最近提议的基于变换器的物体探测器来说,直接采用多尺度特征,直接纳入多尺度特征,导致计算间接费用令人望而却步,因为处理高分辨率特征的注意机制高度复杂。本文介绍了迭代性多尺度特征聚合(IMFA) -- -- 一种通用范例,使得能够有效利用基于变换器的物体探测器中的多尺度特征。核心理念是从几个关键地点利用稀有的多尺度特征,而这是通过两种新设计实现的。首先,IMFA重组了变换器编码的编码解码管道,以便根据探测预测进行迭代更新。第二,IMFA稀疏的样本比例适应性特征,用于在先前检测预测指导下从几个关键地点进行精细的检测。结果是,抽样多尺度特征仍然稀少,但对于对象的升级目标检测仍然非常有益。IMFA的多层次测试。 IMFA IMFA-S-G-S-Syalalalalcalcalb-compal 实验显示,拟议的多级的模拟测试。