用于蒙面图像建模的绿色高层次愿景变异器 (Green Hierarchical Vision Transformer for Masked Image Modeling)

We present an efficient approach for Masked Image Modeling (MIM) with hierarchical Vision Transformers (ViTs), e.g., Swin Transformer, allowing the hierarchical ViTs to discard masked patches and operate only on the visible ones. Our approach consists of two key components. First, for the window attention, we design a Group Window Attention scheme following the Divide-and-Conquer strategy. To mitigate the quadratic complexity of the self-attention w.r.t. the number of patches, group attention encourages a uniform partition that visible patches within each local window of arbitrary size can be grouped with equal size, where masked self-attention is then performed within each group. Second, we further improve the grouping strategy via the Dynamic Programming algorithm to minimize the overall computation cost of the attention on the grouped patches. As a result, MIM now can work on hierarchical ViTs in a green and efficient way. For example, we can train the hierarchical ViTs about 2.7$\times$ faster and reduce the GPU memory usage by 70%, while still enjoying competitive performance on ImageNet classification and the superiority on downstream COCO object detection benchmarks. Code and pre-trained models have been made publicly available at https://github.com/LayneH/GreenMIM.

翻译：我们展示了使用高等级视觉变形器(View Greangers)的蒙面图像模型(MIM)的有效方法,例如Swin 变形器,允许等级的Vits丢弃遮盖的补丁,仅对可见的补丁操作。我们的方法由两个关键部分组成。首先,为了窗口注意,我们设计了一个按照分而治之战略的集团窗口注意方案。为了减少自我注意的四级复杂程度,补丁的数量,团体注意鼓励一个统一的分隔,即每个任意大小的本地窗口中可见的补丁可以以同等大小分组,然后在每个组内进行隐藏的自我留意。第二,我们通过动态方案拟订算法进一步改进分组战略,以尽量减少分组补补补补的总体计算成本。结果,MIM现在可以以绿色和有效的方式处理等级的维特。例如,我们可以对等级ViTs大约2.7美元的时间进行训练,并将GPU记忆的使用减少70%,同时在图像网/网络分类上仍享有竞争性业绩,并在MLAGROM/CS/CREGR Creal Card Basional Card coal Card coards real case coal coal card coal deal deal deal deal deal deal demodudududududustrislationsildal dealdaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldal code coildaldismildaldaldaldaldaldaldaldaldalds codeildaldisildsildsilds codeds) 。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日