Spatial attention has been widely used to improve the performance of convolutional neural networks by allowing them to focus on important information. However, it has certain limitations. In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that it can solve the problem of convolutional kernel parameter sharing. Despite this, the information contained in the attention map generated by spatial attention is not sufficient for large-size convolutional kernels. Therefore, we introduce a new attention mechanism called Receptive-Field Attention (RFA). While previous attention mechanisms such as the Convolutional Block Attention Module (CBAM) and Coordinate Attention (CA) only focus on spatial features, they cannot fully address the issue of convolutional kernel parameter sharing. In contrast, RFA not only focuses on the receptive-field spatial feature but also provides effective attention weights for large-size convolutional kernels. The Receptive-Field Attention convolutional operation (RFAConv), developed by RFA, represents a new approach to replace the standard convolution operation. It offers nearly negligible increment of computational cost and parameters, while significantly improving network performance. We conducted a series of experiments on ImageNet-1k, MS COCO, and VOC datasets, which demonstrated the superiority of our approach in various tasks including classification, object detection, and semantic segmentation. Of particular importance, we believe that it is time to shift focus from spatial features to receptive-field spatial features for current spatial attention mechanisms. By doing so, we can further improve network performance and achieve even better results. The code and pre-trained models for the relevant tasks can be found at https://github.com/Liuchen1997/RFAConv.
翻译:空间注意力已被广泛应用于改善卷积神经网络的性能,让它们专注于重要信息。然而,它有一些局限性。在本文中,我们提出了一个新的视角,即空间注意力的有效性可以解决卷积核参数共享的问题。尽管如此,由空间注意力生成的注意力图所包含的信息对于大尺寸卷积核来说是不充分的。因此,我们引入了一种新的注意机制,称为 Receptive-Field Attention (RFA)。虽然先前的注意机制,如卷积块注意模块(CBAM)和坐标注意(CA),仅集中于空间特征,但它们无法完全解决卷积核参数共享问题。相反,RFA不仅关注感受野空间特征,还为大尺寸卷积核提供了有效的注意权重。由 RFA 开发的 Receptive-Field Attention 卷积操作 (RFAConv) 代表了一种新的方法,可以替换标准卷积操作。它几乎不会增加计算成本和参数,同时明显提高网络性能。我们在 ImageNet-1k、MS COCO 和 VOC 数据集上进行了一系列实验,展示了我们的方法在各种任务中,包括分类、物体检测和语义分割方面的优越性。尤其重要的是,我们认为现在应该把重点从空间特征转移到感受野空间特征上来,以进一步提高网络性能并取得更好的结果。相关任务的代码和预先训练的模型可在 https://github.com/Liuchen1997/RFAConv 找到。