As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank recovery problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants.
翻译:作为现代深层学习的基本要素,关注机制,特别是自我关注机制,在全球相关发现中发挥着关键作用。然而,在模拟全球背景时,手工制造的注意力是不可替代的。我们有趣的发现是,自我关注并不比20年前开发的矩阵分解模型(MD)模型更好,该模型涉及长距离依赖性编码的性能和计算成本。我们将全球背景问题作为低级恢复问题模型,并表明其优化算法有助于设计全球信息区块。本文随后提出了一系列汉堡人,其中我们使用优化算法将投入表达纳入次矩阵,并重建低级嵌入器。 不同MD的汉堡人在认真应对通过MD反向适应的梯度时,可以优于广受欢迎的全球背景模块自我关注。 在愿景任务中,我们进行了全面实验,了解全球背景至关重要,包括语系分割和图像生成,展示了自我保护及其变体的重大改进。