动态组变换器: 具有动态组注意的一般愿景变换器后骨 (Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention)

Recently, Transformers have shown promising performance in various vision tasks. To reduce the quadratic computation complexity caused by each query attending to all keys/values, various methods have constrained the range of attention within local regions, where each query only attends to keys/values within a hand-crafted window. However, these hand-crafted window partition mechanisms are data-agnostic and ignore their input content, so it is likely that one query maybe attends to irrelevant keys/values. To address this issue, we propose a Dynamic Group Attention (DG-Attention), which dynamically divides all queries into multiple groups and selects the most relevant keys/values for each group. Our DG-Attention can flexibly model more relevant dependencies without any spatial constraint that is used in hand-crafted window based attention. Built on the DG-Attention, we develop a general vision transformer backbone named Dynamic Group Transformer (DGT). Extensive experiments show that our models can outperform the state-of-the-art methods on multiple common vision tasks, including image classification, semantic segmentation, object detection, and instance segmentation.

翻译：最近,变换器在各种视觉任务中表现出了有希望的性能。为了减少每个查询涉及所有关键/价值而引发的二次计算复杂性,各种方法限制了当地区域的关注范围,因为每个查询只关注手制窗口中的键/价值。然而,这些手工制作的窗口分割机制是数据不可知的,忽视了输入内容,因此,可能有一个查询可能关注无关的键/价值。为了解决这个问题,我们提议一个动态群注意(DG-Attention),将所有查询动态组分为多个组,并为每个组选择最相关的键/价值。我们的DG-Atention可以灵活地模拟更相关的依赖性,而没有基于注意的手工制作窗口中所使用的任何空间限制。在DG-Atention的基础上,我们开发了一个名为动态组变换器(DGT)的通用变压器主干。广泛的实验显示,我们的模型可以超越多种共同视觉任务上的状况式方法,包括图像分类、语义分解、对象探测和像形分割。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日