Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call "coordinate attention". Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes that aggregate features along the two spatial directions, respectively. In this way, long-range dependencies can be captured along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction. The resulting feature maps are then encoded separately into a pair of direction-aware and position-sensitive attention maps that can be complementarily applied to the input feature map to augment the representations of the objects of interest. Our coordinate attention is simple and can be flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt, and EfficientNet with nearly no computational overhead. Extensive experiments demonstrate that our coordinate attention is not only beneficial to ImageNet classification but more interestingly, behaves better in down-stream tasks, such as object detection and semantic segmentation. Code is available at https://github.com/Andrew-Qibin/CoordAttention.
翻译:移动网络设计的最新研究表明,对提升模型性能的频道关注(例如,Squeeze-和Exexating at-Expresent at)的显著效果(例如,Squeeze-and-Excreat at-exacent at),显示了提高模型性能的频道关注度(例如,Squeeze-和Exexacent at-Exacent at)的显著效果,但通常忽视定位信息,而定位信息对于生成空间选择性的注意地图十分重要。在本文中,我们建议为移动网络建立一个新的关注机制,将定位信息嵌入频道关注点,将定位信息嵌入通过 2D 全球集合的频道关注度(例如,Squeze-Excreat-Exprespective),而定位信息对于生成空间选择性的关注度非常重要。 由此生成的功能性地图被分别编码成一组方向认知和位置敏感关注度的地图,可以补充输入功能特征地图,以扩大兴趣对象的表达。 我们的协调关注度很简单,并且可以灵活地插入典型的流动网络,例如移动网络、OV2、移动目标、移动网络/colvexet、移动、移动/Cloyal-comreut、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动式、移动式、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动等等等等等等等等等等等等等实验、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动、移动等等等等等等