Aggregating information from features across different layers is an essential operation for dense prediction models. Despite its limited expressiveness, feature concatenation dominates the choice of aggregation operations. In this paper, we introduce Attentive Feature Aggregation (AFA) to fuse different network layers with more expressive non-linear operations. AFA exploits both spatial and channel attention to compute weighted average of the layer activations. Inspired by neural volume rendering, we extend AFA with Scale-Space Rendering (SSR) to perform late fusion of multi-scale predictions. AFA is applicable to a wide range of existing network designs. Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead. In particular, AFA improves the performance of the Deep Layer Aggregation (DLA) model by nearly 6% mIoU on Cityscapes. Our experimental analyses show that AFA learns to progressively refine segmentation maps and to improve boundary details, leading to new state-of-the-art results on boundary detection benchmarks on BSDS500 and NYUDv2. Code and video resources are available at http://vis.xyz/pub/dla-afa.
翻译:将不同层次的特征信息汇总起来是密集预测模型的关键操作。尽管其表达性有限,但特征共化是集成操作的选择。在本文中,我们引入了快速特异聚合(AFA),将不同网络层结合为更直观的非线性操作。AFA利用空间和渠道两方面的注意力来计算层活化的加权平均值。受神经体量分析的启发,我们将具有规模-空间降幅的AFA(SSR)扩展为进行较晚的多尺度预测组合。AFA(SSR)适用于广泛的现有网络设计。我们的实验显示,在具有挑战性的语义分化基准方面,包括市景、BDD100K(AFA)和Mably Vistas(AFA),以可忽略的计算和参数间接。特别是,AFAFA(DA)将深层聚合(DLA)模型的性能提高了近6% mIO(S)U(S-IO),我们的实验分析显示,AFA学会逐步完善分块地图和改进边界细节,导致新的州-DFDS-DS/DRDS-DS-DS/DRDR的图像检测结果。