Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. Specifically, we first show that a traditional convolution with kernel size k x k can be decomposed into k^2 individual 1x1 convolutions, followed by shift and summation operations. Then, we interpret the projections of queries, keys, and values in self-attention module as multiple 1x1 convolutions, followed by the computation of attention weights and aggregation of the values. Therefore, the first stage of both two modules comprises the similar operation. More importantly, the first stage contributes a dominant computation complexity (square of the channel size) comparing to the second stage. This observation naturally leads to an elegant integration of these two seemingly distinct paradigms, i.e., a mixed model that enjoys the benefit of both self-Attention and Convolution (ACmix), while having minimum computational overhead compared to the pure convolution or self-attention counterpart. Extensive experiments show that our model achieves consistently improved results over competitive baselines on image recognition and downstream tasks. Code and pre-trained models will be released at https://github.com/LeapLabTHU/ACmix and https://gitee.com/mindspore/models.
翻译:进化和自我保护是代表制学习的两种强大的技术,通常被视为两种不同的同侪方法。在本文中,我们显示它们之间存在一种强大的内在关系,即这两个模式的大部分计算方法实际上都是以同样的操作完成的。具体地说,我们首先表明,与 kxk k k 的内核大小的传统进化可以分解成 k=2 个人1x1 的进化,然后是转移和对等操作。然后,我们把自我注意模块中的查询、钥匙和价值的预测解释为多个 1x1 进化,随后是计算关注权重和价值的汇总。因此,这两个模块的第一阶段是类似的操作。更重要的是,第一阶段是主要的计算复杂性(频道大小),可以分解成一个K&k2 个人1x1的进化,然后是转移和对等操作。我们将自我保存和进化模块中的查询,即混合模型,既具有自我保存和进化的好处,又具有竞争力的进化模型/进化模型。同时,在下游/进化的模型中,要持续地显示最起码的进化的进化的进化/进化模型。