VITAE:通过探索内在感性诱导偏见而提高的愿景变异器 (ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias) - 专知论文

会员服务 ·

0

归纳偏好 · Vision · 变换 · 有偏 · MoDELS ·

2021 年 6 月 7 日

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

翻译：VITAE:通过探索内在感性诱导偏见而提高的愿景变异器

Yufei Xu,Qiming Zhang,Jing Zhang,Dacheng Tao

from arxiv, 17 pages

Transformers have shown great potential in various computer vision tasks owing to their strong capability in modeling long-range dependency using the self-attention mechanism. Nevertheless, vision transformers treat an image as 1D sequence of visual tokens, lacking an intrinsic inductive bias (IB) in modeling local visual structures and dealing with scale variance. Alternatively, they require large-scale training data and longer training schedules to learn the IB implicitly. In this paper, we propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, \ie, ViTAE. Technically, ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context by using multiple convolutions with different dilation rates. In this way, it acquires an intrinsic scale invariance IB and is able to learn robust feature representation for objects at various scales. Moreover, in each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network. Consequently, it has the intrinsic locality IB and is able to learn local features and global dependencies collaboratively. Experiments on ImageNet as well as downstream tasks prove the superiority of ViTAE over the baseline transformer and concurrent works. Source code and pretrained models will be available at GitHub.

翻译：各种计算机视觉任务中,变异器显示出巨大的潜力,因为它们在利用自我注意机制模拟远程依赖性方面具有很强的模型模型。然而,视觉变异器将图像作为视觉象征的1D序列处理,在模拟本地视觉结构和处理规模差异方面缺乏内在的感应偏差(IB ) 。或者,它们需要大型培训数据和较长的培训时间表来隐含地学习IB。在本文件中,我们提议了一个新型的愿景变异器先进器,通过探索从聚合、\ie、VitaE 的内在IB。技术上,VitaE拥有几个空间金字塔降缩缩到下模组的模块,并将输入图像嵌入丰富的多尺度背景符号中,通过使用不同变异率的多重变异变(IB ), 从而获得内在的IB 规模, 并能够学习不同尺度前天体的物体的强性特征。此外, VitATE在每一个变异器层里, 与多头自我留念模式平行, 其功能被连接并被反馈向上网络。因此, 将输入输入输入的输入图像图像图像图像图像图像图像图像图象系统, 将成功定位系统, 成为全球的底图层,可以学习。

1

相关内容

归纳偏好

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

“内卷“算子超越卷积、自注意力机制：CVPR2021强大的神经网络新算子involution

专知会员服务

28+阅读 · 2021年3月27日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日

【电子书】《计算机视觉中的多视图几何(第2版)》英文版，Multiple View Geometry in Computer Vision，附673页PDF

【电子书】《计算机视觉中的多视图几何(第2版)》英文版，Multiple View Geometry in Computer Vision，附673页PDF

专知会员服务

132+阅读 · 2020年3月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

PyTorch & PyTorch Geometric图神经网络(GNN)实战

PyTorch & PyTorch Geometric图神经网络(GNN)实战

专知

81+阅读 · 2019年6月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

深度学习NLP相关资源大列表

深度学习NLP相关资源大列表

机器学习研究会

3+阅读 · 2017年9月17日

Rethinking and Improving Relative Position Encoding for Vision Transformer

Arxiv

0+阅读 · 2021年7月29日

ReFormer: The Relational Transformer for Image Captioning

ReFormer: The Relational Transformer for Image Captioning

Arxiv

1+阅读 · 2021年7月29日

Learning Attributed Graph Representations with Communicative Message Passing Transformer

Arxiv

0+阅读 · 2021年7月28日

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

Arxiv

0+阅读 · 2021年7月27日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

A Graph Auto-Encoder for Attributed Network Embedding

A Graph Auto-Encoder for Attributed Network Embedding

Arxiv

3+阅读 · 2019年6月20日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Arxiv

3+阅读 · 2018年8月16日

Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking

Arxiv

5+阅读 · 2018年2月27日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

“内卷“算子超越卷积、自注意力机制：CVPR2021强大的神经网络新算子involution

专知会员服务

28+阅读 · 2021年3月27日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日

【电子书】《计算机视觉中的多视图几何(第2版)》英文版，Multiple View Geometry in Computer Vision，附673页PDF

【电子书】《计算机视觉中的多视图几何(第2版)》英文版，Multiple View Geometry in Computer Vision，附673页PDF

专知会员服务

132+阅读 · 2020年3月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《战略分析：面向国防与国际安全的建模与仿真》

《俄乌战争中影响力行动的社交媒体分析》2025最新69页

什么是模块化开放系统方法（MOSA）？从美陆军新型倾转旋翼机视角解读

《用于评估军事作战场景的仿真环境》

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

PyTorch & PyTorch Geometric图神经网络(GNN)实战

PyTorch & PyTorch Geometric图神经网络(GNN)实战

专知

81+阅读 · 2019年6月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

深度学习NLP相关资源大列表

深度学习NLP相关资源大列表

机器学习研究会

3+阅读 · 2017年9月17日

相关论文

Rethinking and Improving Relative Position Encoding for Vision Transformer

Arxiv

0+阅读 · 2021年7月29日

ReFormer: The Relational Transformer for Image Captioning

ReFormer: The Relational Transformer for Image Captioning

Arxiv

1+阅读 · 2021年7月29日

Learning Attributed Graph Representations with Communicative Message Passing Transformer

Arxiv

0+阅读 · 2021年7月28日

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

Arxiv

0+阅读 · 2021年7月27日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

A Graph Auto-Encoder for Attributed Network Embedding

A Graph Auto-Encoder for Attributed Network Embedding

Arxiv

3+阅读 · 2019年6月20日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Arxiv

3+阅读 · 2018年8月16日

Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking

Arxiv

5+阅读 · 2018年2月27日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

微信扫码咨询专知VIP会员