利用图层渐进统计数据对深神经网络进行有效培训 (On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks) - 专知论文

会员服务 ·

0

AdaBelief · 统计量 · Adam · Performer · Neural Networks ·

2022 年 4 月 6 日

On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks

翻译：利用图层渐进统计数据对深神经网络进行有效培训

Guoqiang Zhang,Kenta Niwa,W. Bastiaan Kleijn

from arxiv, 9 pages

Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g_t^2 and the squared prediction error (m_t-g_t)^2, respectively, where m_t is the first momentum at iteration t and can be viewed as a prediction of g_t. In this work, we attempt to find out if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above research question in two steps. Firstly, we slightly modify Adam and AdaBelief by introducing layerwise adaptive stepsizes in their update procedures via either pre or post processing. Empirical study indicates that the slight modification produces comparable performance for training VGG and ResNet models over CIFAR10, suggesting that layer-wise gradient statistics plays an important role towards the success of Adam and AdaBelief for at least certian DNN tasks. In the second step, instead of manual setup of layerwise stepsizes, we propose Aida, a new optimisation method, with the objective that the elementwise stepsizes within each layer have significantly small statistic variances. Motivated by the fact that (m_t-g_t)^2 in AdaBelief is conservative in comparison to g_t^2 in Adam in terms of layerwise statistic averages and variances, Aida is designed by tracking a more conservative function of m_t and g_t than (m_t-g_t)^2 in AdaBelief via layerwise orthogonal vector projections. Experimental results show that Aida produces either competitive or better performance with respect to a number of existing methods including Adam and AdaBelief for a set of challenging DNN tasks.

翻译：Adam 和 Adabelief 计算并使用元素性适应步骤来培训深神经网络(DNNS), 方法是跟踪正向梯度 G_ t% 2 的指数移动平均值( EMA) 和正方向梯度预测错误( m_ t- g_ t) 2, 其中 m_ t 是迭代 t 的第一个动力, 可以被视为 g_ t 的预测。在这项工作中, 我们试图找出, 亚达和Adabelief 的分层梯度统计是否可以在亚达和Adabelief 中推广, 以便更有效地培训 DNNNN 。我们用两个步骤解决上述研究问题。首先, 我们略微修改亚当和Adabelief, 通过预处理前或后处理,在更新程序中引入分层调整步骤。爱比亚达和Adalief dreadlief 的分数, 显示亚达和Adabelief 的分级数据对于至少是更具有挑战性的。在第二个步骤中, 亚达比亚达平平级的阶级显示Aidal_ 的分级的分级的阶值, 。

0

相关内容

AdaBelief

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

分数阶微分-代数方程的高精度数值算法

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

面向超多目标优化的分解进化算法

国家自然科学基金

0+阅读 · 2014年12月31日

复杂网络传播动力学的数学分析

国家自然科学基金

6+阅读 · 2013年12月31日

相依样本下的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

基于集对分析的交通信号控制评价及优化方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

随机时滞动态网络的分岔控制与优化

国家自然科学基金

0+阅读 · 2012年12月31日

道路险情下的驾驶员脑力负荷与脑力疲劳研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的低维非线性逼近中的非凸优化模型的有效解法和软件

国家自然科学基金

0+阅读 · 2012年12月31日

基于Frenet标架曲率半径函数的涡旋型线构建理论与特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

ARCH: Efficient Adversarial Regularized Training with Caching

Arxiv

0+阅读 · 2022年4月20日

Effects of Graph Convolutions in Deep Networks

Arxiv

0+阅读 · 2022年4月20日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《海战法：海战中的人工智能与自主系统》最新45页

《美军条令：行动后评估》2025最新36页

中文版 | 先进通信技术

《国防系统提升可靠性与维护性评估效能的实践准则》最新64页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

ARCH: Efficient Adversarial Regularized Training with Caching

Arxiv

0+阅读 · 2022年4月20日

Effects of Graph Convolutions in Deep Networks

Arxiv

0+阅读 · 2022年4月20日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

分数阶微分-代数方程的高精度数值算法

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

面向超多目标优化的分解进化算法

国家自然科学基金

0+阅读 · 2014年12月31日

复杂网络传播动力学的数学分析

国家自然科学基金

6+阅读 · 2013年12月31日

相依样本下的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

基于集对分析的交通信号控制评价及优化方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

随机时滞动态网络的分岔控制与优化

国家自然科学基金

0+阅读 · 2012年12月31日

道路险情下的驾驶员脑力负荷与脑力疲劳研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的低维非线性逼近中的非凸优化模型的有效解法和软件

国家自然科学基金

0+阅读 · 2012年12月31日

基于Frenet标架曲率半径函数的涡旋型线构建理论与特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员