Feature pyramids have become ubiquitous in multi-scale computer vision tasks such as object detection. Given their importance, a computer vision network can be divided into three parts: a backbone (generating a feature pyramid), a neck (refining the feature pyramid) and a head (generating the final output). Many existing networks operating on feature pyramids, named necks, are shallow and mostly focus on communication-based processing in the form of top-down and bottom-up operations. We present a new neck architecture called Trident Pyramid Network (TPN), that allows for a deeper design and for a better balance between communication-based processing and self-processing. We show consistent improvements when using our TPN neck on the COCO object detection benchmark, outperforming the popular BiFPN baseline by 0.5 AP, both when using the ResNet-50 and the ResNeXt-101-DCN backbone. Additionally, we empirically show that it is more beneficial to put additional computation into the TPN neck, rather than into the backbone, by outperforming a ResNet-101+FPN baseline with our ResNet-50+TPN network by 1.7 AP, while operating under similar computation budgets. This emphasizes the importance of performing computation at the feature pyramid level in modern-day object detection systems. Code is available at https://github.com/CedricPicron/TPN .
翻译:在诸如物体探测等多种规模计算机视觉任务中,特质金字塔变得无处不在。鉴于其重要性,计算机视觉网络可以分为三个部分:骨干(产生一个特征金字塔)、颈部(精细特征金字塔)和头部(产生最终产出);许多在特质金字塔上运作的网络,称为颈部,是浅的,主要侧重于以自上而下和自下而上行动的形式进行基于通信的处理;我们提出了一个新的颈部结构,称为三叉戟金字塔网络(TPN),它可以进行更深层次的设计,在通信处理和自处理之间实现更好的平衡。我们在使用COCO物体探测基准的主题方案网络颈部时显示不断的改进,在使用ResNet-50和ResNXt-101-DCN主干线时,比BIFPN基准低0.5;在使用ResNet-101+FPN网络网络和ResNet-50+TPN网络网络的基线时,我们展示了不断改进的改进。在1.7 AP-PIBM 的现代检测模型模型中,在进行类似的计算时,我们的经验显示这个系统的重要性。