利用图层渐进统计数据对深神经网络进行有效培训 (On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks) - 专知论文

会员服务 ·

0

AdaBelief · 统计量 · Adam · Performer · Neural Networks ·

2022 年 4 月 28 日

On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks

翻译：利用图层渐进统计数据对深神经网络进行有效培训

Guoqiang Zhang,Kenta Niwa,W. Bastiaan Kleijn

from arxiv, 8 pages

Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g_t^2 and the squared prediction error (m_t-g_t)^2, respectively, where m_t is the first momentum at iteration t and can be viewed as a prediction of g_t. In this work, we investigate if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above research question in two steps. Firstly, we slightly modify Adam and AdaBelief by introducing layerwise adaptive stepsizes in their update procedures via either pre- or post-processing. Our empirical results indicate that the slight modification produces comparable performance for training VGG and ResNet models over CIFAR10 and CIFAR100, suggesting that layer-wise gradient statistics play an important role towards the success of Adam and AdaBelief for at least certian DNN tasks. In the second step, we propose Aida, a new optimisation method, with the objective that the elementwise stepsizes within each layer have significantly smaller statistical variances, and the layerwise average stepsizes are much more compact across all the layers. Motivated by the fact that (m_t-g_t)^2 in AdaBelief is conservative in comparison to g_t^2 in Adam in terms of layerwise statistical averages and variances, Aida is designed by tracking a more conservative function of m_t and g_t than (m_t-g_t)^2 via layerwise vector projections. Experimental results show that Aida produces either competitive or better performance with respect to a number of existing methods including Adam and AdaBelief for a set of challenging DNN tasks.

翻译：Adam 和 AdaBelief 计算并使用元素性适应步骤来培训深神经网络(DNNS) 。我们用两个步骤来解决上述研究问题。首先, 我们略微修改 Adam 和 AdaBelief 的指数移动平均值(EMA), 通过预处理或后处理在其更新程序中引入分层性调整步骤(m_t-g_t) 。我们的实验结果表明, m_t 是VGG和ResNet模型在 CIFAR10 和 CIFAR100 上的第一个动性能, 表明通过亚达和 AdaBelief 的分层性梯度统计对于 DNNN 的成功起着重要作用。在第二个步骤中, 我们略微修改 Adam 和 AdaBelief, 通过预处理前或后处理后处理, 在更新程序中引入分层性能的分级性能, 显示VGGGG和ResNet模型在 CIFAR10 和 CFAR100 上的可比较性能性能。。我们的分层性梯度性梯级统计性统计性统计性统计性统计性统计性数据中, 通过所有分级的分级的分级的分级化方法显示一个比 Ad_ drociental_ droaltialal_ drodealdeal_ disma 。

0

相关内容

AdaBelief

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

三七皂苷调控miR-18a介导的TGF-β/smads信号通路抑制肺癌转移的效应机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

IMP3调控上皮间质转化和肿瘤干细胞进而参与结肠癌发生和转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

近似稀疏高维非参与半参模型的Dantzig Selector的研究

国家自然科学基金

0+阅读 · 2012年12月31日

界面结构对基底异质形核特性及形核点分布影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

内燃机跨临界/超临界燃料喷雾混合过程的机理与模型

国家自然科学基金

0+阅读 · 2012年12月31日

弯曲应变对低维人工微结构（纳米线/石墨烯）的电子结构、光学和电学性质的调制作用

国家自然科学基金

0+阅读 · 2011年12月31日

磷酸酪氨酸磷酸酶PTP-PEST在肝癌细胞转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

铟掺杂方钴矿基热电材料的电热协同输运效应及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

LuSiO5:Ce新型透明薄膜微结构的调控及其闪烁性能

国家自然科学基金

0+阅读 · 2009年12月31日

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

Arxiv

0+阅读 · 2022年6月13日

A Classification of $G$-invariant Shallow Neural Networks

Arxiv

0+阅读 · 2022年6月13日

Learning the Space of Deep Models

Arxiv

0+阅读 · 2022年6月10日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Self-training with Noisy Student improves ImageNet classification

Arxiv

15+阅读 · 2019年11月11日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

Arxiv

0+阅读 · 2022年6月13日

A Classification of $G$-invariant Shallow Neural Networks

Arxiv

0+阅读 · 2022年6月13日

Learning the Space of Deep Models

Arxiv

0+阅读 · 2022年6月10日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Self-training with Noisy Student improves ImageNet classification

Arxiv

15+阅读 · 2019年11月11日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

三七皂苷调控miR-18a介导的TGF-β/smads信号通路抑制肺癌转移的效应机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

IMP3调控上皮间质转化和肿瘤干细胞进而参与结肠癌发生和转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

近似稀疏高维非参与半参模型的Dantzig Selector的研究

国家自然科学基金

0+阅读 · 2012年12月31日

界面结构对基底异质形核特性及形核点分布影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

内燃机跨临界/超临界燃料喷雾混合过程的机理与模型

国家自然科学基金

0+阅读 · 2012年12月31日

弯曲应变对低维人工微结构（纳米线/石墨烯）的电子结构、光学和电学性质的调制作用

国家自然科学基金

0+阅读 · 2011年12月31日

磷酸酪氨酸磷酸酶PTP-PEST在肝癌细胞转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

铟掺杂方钴矿基热电材料的电热协同输运效应及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

LuSiO5:Ce新型透明薄膜微结构的调控及其闪烁性能

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员