利用规模变化内建结构对神经网络进行强力培训 (Robust Training of Neural Networks using Scale Invariant Architectures) - 专知论文

会员服务 ·

0

尺度不变性 · SGD · 稳健性 · 缩放 · Weight ·

2022 年 2 月 2 日

Robust Training of Neural Networks using Scale Invariant Architectures

翻译：利用规模变化内建结构对神经网络进行强力培训

Zhiyuan Li,Srinadh Bhojanapalli,Manzil Zaheer,Sashank J. Reddi,Sanjiv Kumar

from arxiv, 36 pages, 7 figures

In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models. However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, i.e. the scale of parameter doesn't affect the output of the network, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\tfrac{2\lambda}{\eta}}$, where $\eta$ is learning rate and $\lambda$ is weight decay. We show that this general approach is robust to rescaling of parameter and loss by proving that its convergence only depends logarithmically on the scale of initialization and loss, whereas the standard SGD might not even converge for many initializations. Following our recipe, we design a scale invariant version of BERT, called SIBERT, which when trained simply by vanilla SGD achieves performance comparable to BERT trained by adaptive methods like Adam on downstream tasks.

翻译：与SGD不同,亚当等适应性梯度方法允许对现代深层网络进行强有力的培训,特别是大型语言模型。然而,适应性方法的使用不仅以额外记忆的代价为代价,而且还提出了根本性问题:SGD等非适应性方法能否享受类似的好处?在本文件中,我们通过建议通过以下总配方实现稳健和记忆效率高的培训,为这一问题提供一个肯定的答案:(1) 修改结构并使其规模变异,即参数的规模不会影响网络的产出,(2) 使用SGD和重量衰减的培训和可选(3) 将全球梯度规范与重量标准成正比乘以$sqrt\tfrac{2\\lambda\neta ⁇ $, 其中美元是学习率,美元是重量衰减。我们表明,这种总体方法对于调整参数和损失的尺度是稳健的,证明其趋同程度仅取决于初始化和损失的大小,而标准SGDD可能不为许多初始化。按照我们的配方,我们设计了一个比重的SGDG,我们在经过培训的SDIRB升级后,我们设计了一个比级的SDIRB的升级的升级的SDBSDBSDB方法,在达到可比较的SDIRBSDBSDBSDBSDBSDBSDBSBSB的升级的升级式的升级式的升级方法。

0

相关内容

尺度不变性

尺度不变性

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

超导量子比特器件的材料、工艺、设计探索

国家自然科学基金

1+阅读 · 2017年12月31日

李超代数中若干问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

卫星舱设备分配与布置集成的总体布局优化方法

国家自然科学基金

0+阅读 · 2014年12月31日

近藤格点系统中的自旋轨道耦合和拓扑量子态

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

复杂环境下交通视频分析的若干关键技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

具有多种密码性质的布尔函数的构造以及代数攻击

国家自然科学基金

0+阅读 · 2011年12月31日

车载网络安全形式化模型与关键技术研究

国家自然科学基金

3+阅读 · 2011年12月31日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Arxiv

0+阅读 · 2022年4月18日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Graph Neural Networks: Architectures, Stability and Transferability

Arxiv

13+阅读 · 2020年8月4日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

尺度不变性

相关VIP内容

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Arxiv

0+阅读 · 2022年4月18日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Graph Neural Networks: Architectures, Stability and Transferability

Arxiv

13+阅读 · 2020年8月4日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

超导量子比特器件的材料、工艺、设计探索

国家自然科学基金

1+阅读 · 2017年12月31日

李超代数中若干问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

卫星舱设备分配与布置集成的总体布局优化方法

国家自然科学基金

0+阅读 · 2014年12月31日

近藤格点系统中的自旋轨道耦合和拓扑量子态

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

复杂环境下交通视频分析的若干关键技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

具有多种密码性质的布尔函数的构造以及代数攻击

国家自然科学基金

0+阅读 · 2011年12月31日

车载网络安全形式化模型与关键技术研究

国家自然科学基金

3+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员