With the development of the self-attention mechanism, the Transformer model has demonstrated its outstanding performance in the computer vision domain. However, the massive computation brought from the full attention mechanism became a heavy burden for memory consumption. Sequentially, the limitation of memory reduces the possibility of improving the Transformer model. To remedy this problem, we propose a novel memory economy attention mechanism named Couplformer, which decouples the attention map into two sub-matrices and generates the alignment scores from spatial information. A series of different scale image classification tasks are applied to evaluate the effectiveness of our model. The result of experiments shows that on the ImageNet-1k classification task, the Couplformer can significantly decrease 28% memory consumption compared with regular Transformer while accessing sufficient accuracy requirements and outperforming 0.92% on Top-1 accuracy while occupying the same memory footprint. As a result, the Couplformer can serve as an efficient backbone in visual tasks, and provide a novel perspective on the attention mechanism for researchers.
翻译:随着自留机制的开发,变异器模型在计算机视觉领域表现出其杰出的性能。然而,从全受关注机制中带来的大规模计算成为记忆消耗的沉重负担。因此,内存限制会减少改进变异器模型的可能性。为了解决这个问题,我们提议了一个新的记忆经济关注机制,名为Couplfrent,它将关注地图分解成两个次矩阵,并从空间信息中产生对称分数。运用了一系列不同规模的图像分类任务来评估我们模型的有效性。实验结果表明,在图像Net-1k分类任务中,Couplrew与常规变异器相比,可显著减少28%的内存消耗量,同时获得足够的准确要求,并在保持同样的记忆足迹的同时在Top-1上超过0.92%的准确性。结果是,Couplrew可以作为视觉任务的一个高效的骨干,并为研究人员的注意机制提供新的视角。