Self-attention and channel attention, modelling the semantic interdependencies in spatial and channel dimensions respectively, have recently been widely used for semantic segmentation. However, computing self-attention and channel attention separately and then fusing them directly can cause conflicting feature representations. In this paper, we propose the Channelized Axial Attention (CAA) to seamlessly integrate channel attention and axial attention with reduced computational complexity. After computing axial attention maps, we propose to channelize the intermediate results obtained from the transposed dot-product so that the channel importance of each axial representation is optimized across the whole receptive field. We further develop grouped vectorization, which allows our model to be run in the very limited GPU memory with a speed comparable with full vectorization. Comparative experiments conducted on multiple benchmark datasets, including Cityscapes, PASCAL Context and COCO-Stuff, demonstrate that our CAA not only requires much less computation resources but also outperforms the state-of-the-art segmentation models based on ResNet-101 on all tested datasets.
翻译:自我关注和引导关注,分别模拟空间和频道层面的语义相互依存性,最近被广泛用于语义分割。然而,单独计算自我关注和引导关注,然后直接粉碎它们,可能会造成特征表达的冲突。在本文中,我们建议“循环轴心(CAA)”(CAA)将频道关注和同步关注无缝地结合,同时降低计算复杂性。在计算轴心分布图后,我们建议将从移植的点产品中获得的中间结果输送出去,以便在整个接收字段优化每个轴心代表的频道重要性。我们进一步开发分组矢量化,使我们的模型能够在非常有限的GPU记忆中运行,其速度与完全矢量化相仿。对多个基准数据集进行的比较实验,包括市景、PACAL环境、COCO-Stuff等,表明我们的CAA不仅需要更少的计算资源,而且超过了在所有测试数据集中基于ResNet-101的状态分解模型。