SA-Net:深革命神经网络的冲击关注 (SA-Net: Shuffle Attention for Deep Convolutional Neural Networks)

Attention mechanisms, which enable a neural network to accurately focus on all the relevant elements of the input, have become an essential component to improve the performance of deep neural networks. There are mainly two attention mechanisms widely used in computer vision studies, \textit{spatial attention} and \textit{channel attention}, which aim to capture the pixel-level pairwise relationship and channel dependency, respectively. Although fusing them together may achieve better performance than their individual implementations, it will inevitably increase the computational overhead. In this paper, we propose an efficient Shuffle Attention (SA) module to address this issue, which adopts Shuffle Units to combine two types of attention mechanisms effectively. Specifically, SA first groups channel dimensions into multiple sub-features before processing them in parallel. Then, for each sub-feature, SA utilizes a Shuffle Unit to depict feature dependencies in both spatial and channel dimensions. After that, all sub-features are aggregated and a "channel shuffle" operator is adopted to enable information communication between different sub-features. The proposed SA module is efficient yet effective, e.g., the parameters and computations of SA against the backbone ResNet50 are 300 vs. 25.56M and 2.76e-3 GFLOPs vs. 4.12 GFLOPs, respectively, and the performance boost is more than 1.34% in terms of Top-1 accuracy. Extensive experimental results on common-used benchmarks, including ImageNet-1k for classification, MS COCO for object detection, and instance segmentation, demonstrate that the proposed SA outperforms the current SOTA methods significantly by achieving higher accuracy while having lower model complexity. The code and models are available at https://github.com/wofmanaf/SA-Net.

翻译：关注机制使神经网络能够准确关注输入的所有相关要素,它已成为改善深神经网络性能的一个必不可少的组成部分。在计算机视觉研究中,主要使用两种关注机制,即\textit{spatial attention}和\textit{chanle attention},分别旨在捕捉像素级双向关系和频道依赖关系。虽然将它们结合在一起可以取得比单个实施更好的性能,但必然会增加计算管理费用。在本文中,我们提议一个高效的突击关注(SA)模块来解决这一问题,该模块将两种类型的关注机制有效组合在一起。具体地说,在同时处理它们之前,SA首先将维维维维多的维次功能传送到多个子功能。然后,SAielfle offical-lority-lential commessionals, 包括SAial-lential commilal etalal exprilations 4。 SA-lations lax 4-SLFL 和Syal 4-M ladal lax laftal lax lax lax lax lax lax lax lax SA-Silal lax lax laxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

近期必读的五篇ICLR 2021【图神经网络（GNN）】相关论文和代码

专知会员服务

69+阅读 · 2021年2月25日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【KDD2020】多层次图卷积网络的跨平台锚链预测，Multi-level Graph Convolutional Networks for Cross-platform Anchor Link Prediction

专知会员服务

34+阅读 · 2020年6月7日

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

156+阅读 · 2020年5月26日