Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
翻译:人类可以自然而有效地在复杂的场景中找到突出的区域。在这种观察的推动下,在计算机视野中引入了关注机制,目的是模仿人类视觉系统的这一方面。这种关注机制可被视为基于输入图像特征的动态重量调整过程。关注机制在许多视觉任务中取得了巨大成功,包括图像分类、物体检测、语义分解、视频理解、图像生成、3D视觉、多模式任务和自我监督学习。在这次调查中,我们全面审查了计算机视觉中的各种关注机制,并按方法分类,如频道关注、空间关注、时间关注和分支关注;一个相关的文献库https://github.com/MenghaoGuo/Aweome-Vision-Atentions专门收集相关工作。我们还建议了关注机制研究的未来方向。