As one of the prevalent components, Feature Pyramid Network (FPN) is widely used in the current object detection models to improve the performance of multi-scale detection. However, its interaction is still in a local and lossy manner, thus limiting the representation power. In this paper, to simulate a global view of human vision in object detection and address the inherent defects of interaction mode in FPN, we construct a novel architecture termed Content-Augmented Feature Pyramid Network (CA-FPN). Unlike the vanilla FPN, which fuses features within a local receptive field, CA-FPN can adaptively aggregate similar features from a global view. It is equipped with a global content extraction module and light linear spatial transformers. The former allows to extract multi-scale context information and the latter can deeply combine the global content extraction module with the vanilla FPN using the linearized attention function, which is designed to reduce model complexity. Furthermore, CA-FPN can be readily plugged into existing FPN-based models. Extensive experiments on the challenging COCO and PASCAL VOC object detection datasets demonstrated that our CA-FPN significantly outperforms competitive FPN-based detectors without bells and whistles. When plugging CA-FPN into Cascade R-CNN framework built upon a standard ResNet-50 backbone, our method can achieve 44.8 AP on COCO mini-val. Its performance surpasses the previous state-of-the-art by 1.5 AP, demonstrating the potentiality of application.
翻译:作为流行的组成部分之一,地貌金字塔网络(FPN)被广泛用于当前物体探测模型中,以提高多级探测的性能;然而,它的互动仍然以局部和失耗的方式进行,从而限制了演示力;在本文中,为了模拟物体探测中的人类视觉全球观,并解决FPN互动模式固有的缺陷,我们建造了一个名为内容增强型地貌金字塔网络(CA-FPN)的新结构。不像香草1.5型FPN(在本地可接收域内装配特征),CA-FPN(CA-FPN)可以适应性地从全球视角中聚合类似的特征。它配备了一个全球内容提取模块和光线性空间变异器。在本文中,为了利用线性关注功能,将全球内容提取模块与Vanilla FPN(VPN)密切结合,我们建造了一个名为CA-FPN(C-PN)的模型。此外,CA-FPN(C)可以很容易被插入到现有的FNCO和PC VOC(PN-PC-VOC)的小型应用模型。