Attention mechanisms, which enable a neural network to accurately focus on all the relevant elements of the input, have become an essential component to improve the performance of deep neural networks. There are mainly two attention mechanisms widely used in computer vision studies, \textit{spatial attention} and \textit{channel attention}, which aim to capture the pixel-level pairwise relationship and channel dependency, respectively. Although fusing them together may achieve better performance than their individual implementations, it will inevitably increase the computational overhead. In this paper, we propose an efficient Shuffle Attention (SA) module to address this issue, which adopts Shuffle Units to combine two types of attention mechanisms effectively. Specifically, SA first groups channel dimensions into multiple sub-features before processing them in parallel. Then, for each sub-feature, SA utilizes a Shuffle Unit to depict feature dependencies in both spatial and channel dimensions. After that, all sub-features are aggregated and a "channel shuffle" operator is adopted to enable information communication between different sub-features. The proposed SA module is efficient yet effective, e.g., the parameters and computations of SA against the backbone ResNet50 are 300 vs. 25.56M and 2.76e-3 GFLOPs vs. 4.12 GFLOPs, respectively, and the performance boost is more than 1.34% in terms of Top-1 accuracy. Extensive experimental results on common-used benchmarks, including ImageNet-1k for classification, MS COCO for object detection, and instance segmentation, demonstrate that the proposed SA outperforms the current SOTA methods significantly by achieving higher accuracy while having lower model complexity. The code and models are available at https://github.com/wofmanaf/SA-Net.
翻译:关注机制使神经网络能够准确关注输入的所有相关要素,它已成为改善深神经网络性能的一个必不可少的组成部分。在计算机视觉研究中,主要使用两种关注机制,即\textit{spatial attention}和\textit{chanle attention},分别旨在捕捉像素级双向关系和频道依赖关系。虽然将它们结合在一起可以取得比单个实施更好的性能,但必然会增加计算管理费用。在本文中,我们提议一个高效的突击关注(SA)模块来解决这一问题,该模块将两种类型的关注机制有效组合在一起。具体地说,在同时处理它们之前,SA首先将维维维维多的维次功能传送到多个子功能。然后,SAielfle offical-lority-lential commessionals, 包括SAial-lential commilal etalal exprilations 4。 SA-lations lax 4-SLFL 和Syal 4-M ladal lax laftal lax lax lax lax lax lax lax lax SA-Silal lax lax laxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx