培训愿景变换者培训的按比例缩减的RELU事项 (Scaled ReLU Matters for Training Vision Transformers) - 专知论文

会员服务 ·

0

ReLU · 缩放 · Better · Vision · 变换 ·

2022 年 1 月 12 日

Scaled ReLU Matters for Training Vision Transformers

翻译：培训愿景变换者培训的按比例缩减的RELU事项

Pichao Wang,Xue Wang,Hao Luo,Jingkai Zhou,Zhipeng Zhou,Fan Wang,Hao Li,Rong Jin

from arxiv, Accepted by AAAI2022

Vision transformers (ViTs) have been an alternative design paradigm to convolutional neural networks (CNNs). However, the training of ViTs is much harder than CNNs, as it is sensitive to the training parameters, such as learning rate, optimizer and warmup epoch. The reasons for training difficulty are empirically analysed in ~\cite{xiao2021early}, and the authors conjecture that the issue lies with the \textit{patchify-stem} of ViT models and propose that early convolutions help transformers see better. In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters. We verify, both theoretically and empirically, that scaled ReLU in \textit{conv-stem} not only improves training stabilization, but also increases the diversity of patch tokens, thus boosting peak performance with a large margin via adding few parameters and flops. In addition, extensive experiments are conducted to demonstrate that previous ViTs are far from being well trained, further showing that ViTs have great potential to be a better substitute of CNNs.

翻译：视觉变压器(ViTs)是革命神经网络(CNNs)的替代设计范式。然而,ViTs的训练比CNN公司要难得多,因为它对学习率、优化率和暖化度等培训参数敏感。培训困难的原因在“cite{xiao2021early}”中进行了经验分析,作者们推测,这个问题不仅在于ViT模型的Textit{patchfatchify-stem},并且建议早期的革命有助于变压器看得更好。在本文中,我们进一步调查了这一问题,并扩大了上述结论:只有早期的革命无助于稳定培训,而对于在\textit{conv-stem} (\textit{conv-stem})中扩大的ReLU操作也很重要。我们从理论上和实验角度都证实,在ViT模型中扩大的ReLU规模不仅改善了培训稳定性,而且还增加了补装品的多样性,因此,通过经过培训的参数和软质变的远的模型来提升了巨大的优势。

0

相关内容

ReLU

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

超轻多孔"类蜂窝"夹层结构材料创新构型及其结构-材料性能一体化设计方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

高价金属氧化物（V，VI等）对钛酸锂倍率性能的改性及其机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

一维纳晶锰基复合氧化物的气敏特性与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

聚苯撑乙炔及其衍生物分子材料的理论设计与性质调制

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

软岩的多掺杂机制和其对软岩水理作用影响的第一性原理研究

国家自然科学基金

0+阅读 · 2011年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

李代数的量子化与双参数量子群的结构与表示

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Trajectory-Aware Transformer for Video Super-Resolution

Arxiv

0+阅读 · 2022年4月20日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

Low-Dose CT Denoising via Sinogram Inner-Structure Transformer

Low-Dose CT Denoising via Sinogram Inner-Structure Transformer

Arxiv

1+阅读 · 2022年4月18日

Salient Objects in Clutter

Salient Objects in Clutter

Arxiv

0+阅读 · 2022年4月18日

Synthesizing Informative Training Samples with GAN

Synthesizing Informative Training Samples with GAN

Arxiv

0+阅读 · 2022年4月15日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

相关VIP内容

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Learning Trajectory-Aware Transformer for Video Super-Resolution

Arxiv

0+阅读 · 2022年4月20日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

Low-Dose CT Denoising via Sinogram Inner-Structure Transformer

Low-Dose CT Denoising via Sinogram Inner-Structure Transformer

Arxiv

1+阅读 · 2022年4月18日

Salient Objects in Clutter

Salient Objects in Clutter

Arxiv

0+阅读 · 2022年4月18日

Synthesizing Informative Training Samples with GAN

Synthesizing Informative Training Samples with GAN

Arxiv

0+阅读 · 2022年4月15日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

超轻多孔"类蜂窝"夹层结构材料创新构型及其结构-材料性能一体化设计方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

高价金属氧化物（V，VI等）对钛酸锂倍率性能的改性及其机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

一维纳晶锰基复合氧化物的气敏特性与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

聚苯撑乙炔及其衍生物分子材料的理论设计与性质调制

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

软岩的多掺杂机制和其对软岩水理作用影响的第一性原理研究

国家自然科学基金

0+阅读 · 2011年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

李代数的量子化与双参数量子群的结构与表示

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员