Spatial attention has been widely used to improve the performance of convolutional neural networks by allowing them to focus on important information. However, it has certain limitations. In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that it can solve the problem of convolutional kernel parameter sharing. Despite this, the information contained in the attention map generated by spatial attention is not sufficient for large-size convolutional kernels. Therefore, we introduce a new attention mechanism called Receptive-Field Attention (RFA). While previous attention mechanisms such as the Convolutional Block Attention Module (CBAM) and Coordinate Attention (CA) only focus on spatial features, they cannot fully address the issue of convolutional kernel parameter sharing. In contrast, RFA not only focuses on the receptive-field spatial feature but also provides effective attention weights for large-size convolutional kernels. The Receptive-Field Attention convolutional operation (RFAConv), developed by RFA, represents a new approach to replace the standard convolution operation. It offers nearly negligible increment of computational cost and parameters, while significantly improving network performance. We conducted a series of experiments on ImageNet-1k, MS COCO, and VOC datasets, which demonstrated the superiority of our approach in various tasks including classification, object detection, and semantic segmentation. Of particular importance, we believe that it is time to shift focus from spatial features to receptive-field spatial features for current spatial attention mechanisms. By doing so, we can further improve network performance and achieve even better results. The code and pre-trained models for the relevant tasks can be found at https://github.com/Liuchen1997/RFAConv.
翻译:空间注意力已被广泛用于改善卷积神经网络的性能,使其能够专注于重要信息。然而,它有一定的局限性。本文提出了对空间注意力有效性的新视角,即它可以解决卷积核参数共享的问题。尽管如此,由空间注意力生成的注意力映射中包含的信息对于大尺寸卷积核来说是不足的。因此,我们引入了一种新的注意力机制,称为Receptive-Field Attention(RFA)。虽然之前的注意力机制(如卷积块注意力模块(CBAM)和坐标注意力(CA))只关注空间特征,但它们不能完全解决卷积核参数共享的问题。相反,RFA不仅关注接受场空间特征,还为大型卷积核提供有效的注意力权重。由RFA开发的Receptive-Field Attention卷积操作(RFAConv)代表了一种替代标准卷积操作的新方法。它在几乎不增加计算成本和参数的情况下,显著提高了网络性能。我们在ImageNet-1k、MS COCO和VOC数据集上进行了一系列实验,证明了我们的方法在各种任务(包括分类、目标检测和语义分割)上的优越性。特别重要的是,我们认为现在是将重点从空间特征转向接受场空间特征的时候了。通过这样做,我们可以进一步提高网络性能并取得更好的结果。相关任务的代码和预训练模型可在https://github.com/Liuchen1997/RFAConv 找到。