适应性渐进方法根正平方块的调查替代办法 (Investigating Alternatives to the Root Mean Square for Adaptive Gradient Methods) - 专知论文

会员服务 ·

0

Adam · 方阵 · Performance · 规范化的 · 均值 ·

2021 年 6 月 10 日

Investigating Alternatives to the Root Mean Square for Adaptive Gradient Methods

翻译：适应性渐进方法根正平方块的调查替代办法

Brett Daley,Christopher Amato

from arxiv, 12 pages, 6 figures, 3 tables

Adam is an adaptive gradient method that has experienced widespread adoption due to its fast and reliable training performance. Recent approaches have not offered significant improvement over Adam, often because they do not innovate upon one of its core features: normalization by the root mean square (RMS) of recent gradients. However, as noted by Kingma and Ba (2015), any number of $L^p$ normalizations are possible, with the RMS corresponding to the specific case of $p=2$. In our work, we theoretically and empirically characterize the influence of different $L^p$ norms on adaptive gradient methods for the first time. We show mathematically how the choice of $p$ influences the size of the steps taken, while leaving other desirable properties unaffected. We evaluate Adam with various $L^p$ norms on a suite of deep learning benchmarks, and find that $p > 2$ consistently leads to improved learning speed and final performance. The choices of $p=3$ or $p=6$ also match or outperform state-of-the-art methods in all of our experiments.

翻译：亚当是一种适应性梯度方法,由于培训效绩迅速可靠,因此得到了广泛采用。最近的办法没有给亚当带来显著的改进,这往往是因为它们没有以其核心特征之一进行创新:最近梯度的根正平方(RMS)正常化;然而,如金玛和巴(2015年)所指出,任何数额的L ⁇ p美元都有可能实现正常化,而RMS与美元=2美元的具体情况相对应。在我们的工作中,我们在理论上和经验上首次对不同的美元标准对适应性梯度方法的影响进行了定性。我们从数学上展示了美元的选择如何影响所采取步骤的规模,同时对其他可取的属性不加影响。我们用一套深层学习基准中的各种L ⁇ p美元标准对亚当进行了评估,发现$ > 2美元始终可以提高学习速度和最终业绩。在我们的所有实验中,$p=3美元或p=6美元的选择也匹配或超越了最先进的方法。

0

相关内容

Adam

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

专知会员服务

62+阅读 · 2019年11月24日

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

专知会员服务

65+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Randomized Online Algorithms for Adwords

Arxiv

0+阅读 · 2021年8月7日

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

Arxiv

0+阅读 · 2021年8月5日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

Implementing the Deep Q-Network

Arxiv

3+阅读 · 2017年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

专知会员服务

62+阅读 · 2019年11月24日

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

专知会员服务

65+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Randomized Online Algorithms for Adwords

Arxiv

0+阅读 · 2021年8月7日

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

Arxiv

0+阅读 · 2021年8月5日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

Implementing the Deep Q-Network

Arxiv

3+阅读 · 2017年11月20日

微信扫码咨询专知VIP会员