关于未调整的热热量是否足以适应性优化 (On the adequacy of untuned warmup for adaptive optimization) - 专知论文

会员服务 ·

0

Adam · RAdam · 优化器 · tuning · 可约的 ·

2020 年 12 月 13 日

On the adequacy of untuned warmup for adaptive optimization

翻译：关于未调整的热热量是否足以适应性优化

Jerry Ma,Denis Yarats

from arxiv, AAAI 2021

Adaptive optimization algorithms such as Adam (Kingma & Ba, 2014) are widely used in deep learning. The stability of such algorithms is often improved with a warmup schedule for the learning rate. Motivated by the difficulty of choosing and tuning warmup schedules, Liu et al. (2020) propose automatic variance rectification of Adam's adaptive learning rate, claiming that this rectified approach ("RAdam") surpasses the vanilla Adam algorithm and reduces the need for expensive tuning of Adam with warmup. In this work, we refute this analysis and provide an alternative explanation for the necessity of warmup based on the magnitude of the update term, which is of greater relevance to training stability. We then provide some "rule-of-thumb" warmup schedules, and we demonstrate that simple untuned warmup of Adam performs more-or-less identically to RAdam in typical practical settings. We conclude by suggesting that practitioners stick to linear warmup with Adam, with a sensible default being linear warmup over $2 / (1 - \beta_2)$ training iterations.

翻译：亚当(Kingma & Ba, 2014)等适应性优化算法( Kingma & Ba, 2014) 被广泛用于深层学习。这种算法的稳定性通常会随着学习速度的暖化而改善。受选择和调整暖化时间表困难的驱使, 刘等人( 202020年) 提出亚当适应性学习率的自动差异校正, 声称这一校正方法( “ RADAM” ) 比香草亚当算法( “ RADAM ” ) 高出香草亚当的算法( RADAM ), 并减少了亚当用暖化进行昂贵调试的需要。在这项工作中, 我们反驳了这项分析, 并提供了基于更新术语规模的暖化必要性的替代解释, 这与培训稳定性更为相关。我们随后提供了一些“ 规则” 的暖化计划, 我们展示了亚当在典型的实际环境中与RADAM 相同。我们最后建议从业者与亚当坚持线性暖,, 合理的默认是线性热度超过 2 / ( -\ et_ 2) 培训。

1

相关内容

Adam

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

专知会员服务

12+阅读 · 2020年11月17日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

专知会员服务

209+阅读 · 2020年7月5日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

德先生

53+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

The Reflectron: Exploiting geometry for learning generalized linear models

Arxiv

0+阅读 · 2021年2月16日

Improper Learning with Gradient-based Policy Optimization

Improper Learning with Gradient-based Policy Optimization

Arxiv

0+阅读 · 2021年2月16日

Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities

Arxiv

0+阅读 · 2021年2月15日

On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective

Arxiv

0+阅读 · 2021年2月14日

A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems

Arxiv

0+阅读 · 2021年2月14日

An interface/boundary-unfitted eXtended HDG method for linear elasticity problems

Arxiv

0+阅读 · 2021年2月14日

On the Last Iterate Convergence of Momentum Methods

Arxiv

0+阅读 · 2021年2月13日

On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and Fourier-based Algorithms

Arxiv

0+阅读 · 2021年2月11日

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Arxiv

4+阅读 · 2020年2月14日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

专知会员服务

12+阅读 · 2020年11月17日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

专知会员服务

209+阅读 · 2020年7月5日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

德先生

53+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

The Reflectron: Exploiting geometry for learning generalized linear models

Arxiv

0+阅读 · 2021年2月16日

Improper Learning with Gradient-based Policy Optimization

Improper Learning with Gradient-based Policy Optimization

Arxiv

0+阅读 · 2021年2月16日

Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities

Arxiv

0+阅读 · 2021年2月15日

On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective

Arxiv

0+阅读 · 2021年2月14日

A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems

Arxiv

0+阅读 · 2021年2月14日

An interface/boundary-unfitted eXtended HDG method for linear elasticity problems

Arxiv

0+阅读 · 2021年2月14日

On the Last Iterate Convergence of Momentum Methods

Arxiv

0+阅读 · 2021年2月13日

On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and Fourier-based Algorithms

Arxiv

0+阅读 · 2021年2月11日

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Arxiv

4+阅读 · 2020年2月14日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

微信扫码咨询专知VIP会员