魔鬼:改善神经网络培训,使其失去动力 (Demon: Improved Neural Network Training with Momentum Decay) - 专知论文

会员服务 ·

0

动量 · 学习率 · Neural Networks · Networking · tuning ·

2021 年 7 月 1 日

Demon: Improved Neural Network Training with Momentum Decay

翻译：魔鬼:改善神经网络培训,使其失去动力

John Chen,Cameron Wolfe,Zhao Li,Anastasios Kyrillidis

from arxiv, 12 pages

Momentum is a widely used technique for gradient-based optimizers in deep learning. In this paper, we propose a decaying momentum (\textsc{Demon}) rule. We conduct the first large-scale empirical analysis of momentum decay methods for modern neural network optimization, in addition to the most popular learning rate decay schedules. Across 28 relevant combinations of models, epochs, datasets, and optimizers, \textsc{Demon} achieves the highest number of Top-1 and Top-3 finishes at 39\% and 85\% respectively, almost doubling the second-placed learning rate cosine schedule at 17\% and 60\%, respectively. \textsc{Demon} also outperforms other widely used schedulers including, but not limited to, the learning rate step schedule, linear schedule, OneCycle schedule, and exponential schedule. Compared with the widely used learning rate step schedule, \textsc{Demon} is observed to be less sensitive to parameter tuning, which is critical to training neural networks in practice. Results are demonstrated across a variety of settings and architectures, including image classification, generative models, and language models. \textsc{Demon} is easy to implement, requires no additional tuning, and incurs almost no extra computational overhead compared to the vanilla counterparts. Code is readily available.

翻译：momentum 是用于深层学习的基于梯度优化器的一种广泛使用的技术。在本文中, 我们提议了一种衰减的势头( textsc{ 守护程序} ) 规则。除了最受欢迎的学习速率衰减时间表之外, 我们还对现代神经网络优化的动力衰减方法进行第一次大规模的经验性分析。在28种相关的模型、时代、数据集和优化组合中,\ textsc{ 守护程序分别达到39 ⁇ 和 85 ⁇ 的顶级-1 和顶级-3 最高完成率, 几乎将第二位学习速的连线时间表分别翻一番 17 ⁇ 和 60 ⁇ { { { 。\ textsc{ 守护程序} 也优于其他广泛使用的进度表方法, 包括但不限于学习速率步骤表、线性时间表、 OneCycle 时间表和指数性时间表。与广泛使用的学习速率步骤时间表相比, 发现对参数调整不太敏感, 这对实践中培训神经网络至关重要。显示的结果来自各种设置和模型, 比较易变式的模型和结构,, 需要额外的模型化模型, 进行额外的分析。

0

相关内容

动量方法 (Polyak, 1964) 旨在加速学习，特别是处理高曲率、小但一致的梯度，或是带噪声的梯度。动量算法积累了之前梯度指数级衰减的移动平均，并且继续沿该方向移动。

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

已删除

将门创投

3+阅读 · 2018年10月11日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

Arxiv

4+阅读 · 2020年10月8日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

A Comparison of Neural Network Training Methods for Text Classification

Arxiv

6+阅读 · 2019年10月28日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

已删除

将门创投

3+阅读 · 2018年10月11日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

相关论文

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

Arxiv

4+阅读 · 2020年10月8日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

A Comparison of Neural Network Training Methods for Text Classification

Arxiv

6+阅读 · 2019年10月28日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员