可缩放的视觉变形器,配有等级式集成器 (Scalable Visual Transformers with Hierarchical Pooling) - 专知论文

会员服务 ·

0

可约的 · 图片分类 · 变换 · 词元分析器 · CC ·

2021 年 3 月 19 日

Scalable Visual Transformers with Hierarchical Pooling

翻译：可缩放的视觉变形器,配有等级式集成器

Zizheng Pan,Bohan Zhuang,Jing Liu,Haoyu He,Jianfei Cai

from arxiv, 10 pages

The recently proposed Visual image Transformers (ViT) with pure attention have achieved promising performance on image recognition tasks, such as image classification. However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation. To this end, we propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length and hence reduces the computational cost, analogous to the feature maps downsampling in Convolutional Neural Networks (CNNs). It brings a great benefit that we can increase the model capacity by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity due to the reduced sequence length. Moreover, we empirically find that the average pooled visual tokens contain more discriminative information than the single class token. To demonstrate the improved scalability of our HVT, we conduct extensive experiments on the image classification task. With comparable FLOPs, our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.

翻译：最近提出的视觉图像变形器(VYT)得到纯净关注,在图像分类等图像识别任务方面取得了有希望的成绩。然而,目前的ViT模型的例行工作是在推断期间保持一个完全长的宽度序列,这是多余的,缺乏等级代表。为此,我们建议使用一个等级式视觉变形器(HVT),逐步将视觉符号集合在一起,以缩短序列长度,从而降低计算成本,这类似于Convolutional Neal网络(CNNs)下映的地貌地图。它带来巨大的好处,即我们可以通过放大深度/宽度/分辨率/分解度/批量尺寸的尺寸来增加模型能力,而不会由于序列长度的缩短而引入额外的计算复杂性。此外,我们从经验上发现,平均集合的视觉符号含有比单级符号更具有歧视性的信息。为了证明我们的HVT的可缩放性,我们在图像分类任务上进行了广泛的实验。通过类似的 FLOPs,我们的HVT超越了图像网和CIFAR-100数据集的竞争性基线。

0

相关内容

可约的

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

专知会员服务

37+阅读 · 2020年1月12日

【新书】高级应用深度学习，卷积神经网络和目标检测（Advanced Applied Deep Learning ，Convolutional Neural Networks and Object Detection），附294页pdf

【新书】高级应用深度学习，卷积神经网络和目标检测（Advanced Applied Deep Learning ，Convolutional Neural Networks and Object Detection），附294页pdf

专知会员服务

95+阅读 · 2020年1月9日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Attention最新进展

Attention最新进展

极市平台

5+阅读 · 2020年5月30日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

已删除

将门创投

3+阅读 · 2017年9月12日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks

Arxiv

0+阅读 · 2021年5月12日

Probabilistic Visual Place Recognition for Hierarchical Localization

Arxiv

0+阅读 · 2021年5月7日

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Arxiv

0+阅读 · 2021年5月5日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Learnable pooling with Context Gating for video classification

Arxiv

3+阅读 · 2018年3月5日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

Hierarchical Question-Image Co-Attention for Visual Question Answering

Arxiv

3+阅读 · 2017年1月19日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

专知会员服务

37+阅读 · 2020年1月12日

【新书】高级应用深度学习，卷积神经网络和目标检测（Advanced Applied Deep Learning ，Convolutional Neural Networks and Object Detection），附294页pdf

【新书】高级应用深度学习，卷积神经网络和目标检测（Advanced Applied Deep Learning ，Convolutional Neural Networks and Object Detection），附294页pdf

专知会员服务

95+阅读 · 2020年1月9日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Attention最新进展

Attention最新进展

极市平台

5+阅读 · 2020年5月30日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

已删除

将门创投

3+阅读 · 2017年9月12日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks

Arxiv

0+阅读 · 2021年5月12日

Probabilistic Visual Place Recognition for Hierarchical Localization

Arxiv

0+阅读 · 2021年5月7日

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Arxiv

0+阅读 · 2021年5月5日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

A dataset and architecture for visual reasoning with a working memory

Arxiv

3+阅读 · 2018年3月16日

Learnable pooling with Context Gating for video classification

Arxiv

3+阅读 · 2018年3月5日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

Hierarchical Question-Image Co-Attention for Visual Question Answering

Arxiv

3+阅读 · 2017年1月19日

微信扫码咨询专知VIP会员