Vision Transformer (ViT) attains state-of-the-art performance in visual recognition, and the variant, Local Vision Transformer, makes further improvements. The major component in Local Vision Transformer, local attention, performs the attention separately over small local windows. We rephrase local attention as a channel-wise locally-connected layer and analyze it from two network regularization manners, sparse connectivity and weight sharing, as well as weight computation. Sparse connectivity: there is no connection across channels, and each position is connected to the positions within a small local window. Weight sharing: the connection weights for one position are shared across channels or within each group of channels. Dynamic weight: the connection weights are dynamically predicted according to each image instance. We point out that local attention resembles depth-wise convolution and its dynamic version in sparse connectivity. The main difference lies in weight sharing - depth-wise convolution shares connection weights (kernel weights) across spatial positions. We empirically observe that the models based on depth-wise convolution and the dynamic variant with lower computation complexity perform on-par with or sometimes slightly better than Swin Transformer, an instance of Local Vision Transformer, for ImageNet classification, COCO object detection and ADE semantic segmentation. These observations suggest that Local Vision Transformer takes advantage of two regularization forms and dynamic weight to increase the network capacity. Code is available at https://github.com/Atten4Vis/DemystifyLocalViT.
翻译:视觉变异器( VIT) 在视觉识别方面达到最先进的表现, 变异器( 本地视野变异器) 取得了进一步的改进。 本地视野变异器( 本地视野变异器) 的主要组成部分是本地视野变异器( 本地关注), 将关注分散在小的本地窗口中。 我们把本地关注重新表述为频道化的本地连接层, 从两种网络正规化方式、 连接和重量共享, 以及重量计算来分析它。 粗略的连接: 各频道之间没有连接, 每个位置都与一个小地方窗口的位置连接。 加权共享: 一个位置的连接权重在各频道之间或每组频道内部共享。 动态权重: 每个图像变异器( 本地视野变异器) 的连接权重根据每个图像来动态预测。 本地视野变异器( 内变异器) 和动态变异器( 变异器) 显示系统变异器( ) 变异器( 变异器) 和变异系统( ) 变异器( 变形) 变形( 变形) 变变形/ 变形( ) 变形( 变形) 变形( 变形) 变形( ) 变形) 变法( ) 变形/ ) ) 变形( ) 变形变形( ) ( 变形) ( ) ) 变形( ) ( 变法( ) ) ) ) ( 变形( ) ( 变形) ( ) ( ) ( ) ( 变形/变形/变形/变形/变形( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 变形) ( 变形) ( ) ( ) ( ) ( ) ( ) 变形) ( ) ( ) ( ) ( ) ( 变形) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (变形) ( ) (变形) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 变形) (