Attention mechanisms, especially self-attention, play an increasingly important role in deep feature representation in visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture long-range dependency within a single sample. However, self-attention has a quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all samples. Extensive experiments on image classification, semantic segmentation, image generation, point cloud classification and point cloud segmentation tasks reveal that our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
翻译:关注机制,特别是自我关注机制,在视觉任务中的深度特征代表中发挥着越来越重要的作用。自我关注更新了每个位置的特征,利用所有位置的双向近似性来计算一个加权的特征总和,以便在单一样本中捕捉长距离依赖性。然而,自我关注具有四重复杂性,忽视了不同样本之间的潜在关联性。本文件建议了一个新的关注机制,我们根据两个外部的、小的、可学习的和共享的记忆来呼吁外部关注,这种机制可以通过简单地使用两个级联的线性层和两个正常化层来轻松实施;它方便地取代现有流行结构中的自我关注。外部关注具有线性复杂性,并隐含考虑所有样本之间的相互关系。关于图像分类、语义分割、图像生成、点云分解和点云分化任务的广泛实验表明,我们的方法为自留机制及其一些变体提供了可比或优性能,其计算和记忆成本要低得多。