Recent advancements in deep neural networks have made remarkable leap-forwards in dense image prediction. However, the issue of feature alignment remains as neglected by most existing approaches for simplicity. Direct pixel addition between upsampled and local features leads to feature maps with misaligned contexts that, in turn, translate to mis-classifications in prediction, especially on object boundaries. In this paper, we propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled higher-level features; and another feature selection module to emphasize the lower-level features with rich spatial details. We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN). Extensive experimental evaluations on four dense prediction tasks and four datasets have demonstrated the efficacy of FaPN, yielding an overall improvement of 1.2 - 2.6 points in AP / mIoU over FPN when paired with Faster / Mask R-CNN. In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within Mask-Former. The code is available from https://github.com/EMI-Group/FaPN.
翻译:深度神经网络最近的进步在密集图像预测中取得了显著的飞跃。然而,特征调整问题仍然被大多数现有的简单化方法所忽视。 直接标本和本地特征之间的直接像素添加导致地图与不匹配的背景,反过来又在预测中导致错误分类,特别是在物体边界上。 在本文件中,我们建议了一个特征调整模块,学习将像素转换为与上层特征相匹配的相容;另一个特征选择模块,以强调低层特征,并具有丰富的空间细节。然后,我们将这两个模块纳入一个上下层金字塔结构,并介绍符合地貌的金字塔网络(FAPN)。对四个密集的预测任务和四个数据集的广泛实验评价显示了FAPN的功效,使AP/ mIoU与FPN相比整体上1.2至2.6点得到全面改进,同时配对上快速/make R-CNN。特别是,我们的FAPN实现了56.7% mIO/FAMEM20K在MASP-FADE/FAMAR 20K内可使用的MAS-MIS-MASP-MAMS-MAMAMA-MAMEM20/FAMAMAMAR20K内可使用的数据代码。