Recent non-local self-attention methods have proven to be effective in capturing long-range dependencies for semantic segmentation. These methods usually form a similarity map of RC*C (by compressing spatial dimensions) or RHW*HW (by compressing channels) to describe the feature relations along either channel or spatial dimensions, where C is the number of channels, H and W are the spatial dimensions of the input feature map. However, such practices tend to condense feature dependencies along the other dimensions,hence causing attention missing, which might lead to inferior results for small/thin categories or inconsistent segmentation inside large objects. To address this problem, we propose anew approach, namely Fully Attentional Network (FLANet),to encode both spatial and channel attentions in a single similarity map while maintaining high computational efficiency. Specifically, for each channel map, our FLANet can harvest feature responses from all other channel maps, and the associated spatial positions as well, through a novel fully attentional module. Our new method has achieved state-of-the-art performance on three challenging semantic segmentation datasets,i.e., 83.6%, 46.99%, and 88.5% on the Cityscapes test set,the ADE20K validation set, and the PASCAL VOC test set,respectively.
翻译:最近的非本地自我注意方法被证明有效,可以捕捉到语义分割的长距离依赖性,这些方法通常形成RC*C(压缩空间尺寸)或RHW*HW(压缩频道)的相似地图,用以描述频道或空间尺寸的特征关系,C是频道的数量,H和W是输入特征地图的空间尺寸。然而,这些做法往往会将特征依赖性与其他维度相融合,从而引起注意的缺失,从而可能导致大天体中小/三类或不一致的分割结果劣等。为了解决这一问题,我们建议采用新的方法,即全注意网络(FLANet),将空间和频道注意力都编码在单一的类似地图上,同时保持高计算效率。具体地说,对于每个频道地图,我们的Flanet可以从所有其他频道地图和相关的空间位置上采集响应特征,并通过一个新的全注意力模块。我们的新方法已经在三个具有挑战性的语义分割区段分隔区段分割数据结构上取得了状态性表现。