GPViT:一个高分辨率的非等级愿景变异器,配有群体传承 (GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation)

We present the Group Propagation Vision Transformer (GPViT): a novel nonhierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit for tasks that involve perceiving fine-grained details such as detection and segmentation, but exchanging global information between these features is expensive in memory and computation because of the way self-attention scales. We provide a highly efficient alternative Group Propagation Block (GP Block) to exchange global information. In each GP Block, features are first grouped together by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped features; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder. We evaluate GPViT on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation. Our method achieves significant performance gains over previous works across all tasks, especially on tasks that require high-resolution outputs, for example, our GPViT-L3 outperforms Swin Transformer-B by 2.0 mIoU on ADE20K semantic segmentation with only half as many parameters. Code and pre-trained models are available at https://github.com/ChenhongyiYang/GPViT .

翻译：我们介绍Group Propagation Vision 变异器(GPVIT) : 一种新型的非等级(即非潮相)变异器模型, 设计为具有高分辨率特征的普通视觉识别。高分辨率特征( 或符号) 自然适合一些任务, 包括感知精细细节, 如检测和分解, 但是这些特征之间的全球信息交流在记忆和计算中成本很高。我们为交换全球信息提供了一个高效的替代组间变异块( GPB) 。在每一个GP Blube中, 将各种功能首先组合为固定数量的可学习组号符号; 然后我们进行集团变异性( 高清晰度), 包括全球信息在组间进行交流; 最后, 更新后的组合特征中的全球信息通过变异器解码器转换回到图像特征。我们评估GPViT 的视觉识别任务包括图像分类、语义分解/ 对象探测和实例分解。我们的方法在以往所有任务中取得了显著的业绩收益, 特别是需要高分辨率变式 SMAL 输出的平段, 。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日