使深学习优化优化非电解优化所需的步骤数量是批量大小的逻辑函数 (The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size) - 专知论文

会员服务 ·

0

优化器 · 非凸 · Batch Size · 泛函 · 可约的 ·

2021 年 8 月 26 日

The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

翻译：使深学习优化优化非电解优化所需的步骤数量是批量大小的逻辑函数

Recently, convergence as well as convergence rate analyses of deep learning optimizers for nonconvex optimization have been widely studied. Meanwhile, numerical evaluations for the optimizers have precisely clarified the relationship between batch size and the number of steps needed for training deep neural networks. The main contribution of this paper is to show theoretically that the number of steps needed for nonconvex optimization of each of the optimizers can be expressed as a rational function of batch size. Having these rational functions leads to two particularly important facts, which were validated numerically in previous studies. The first fact is that there exists an optimal batch size such that the number of steps needed for nonconvex optimization is minimized. This implies that using larger batch sizes than the optimal batch size does not decrease the number of steps needed for nonconvex optimization. The second fact is that the optimal batch size depends on the optimizer. In particular, it is shown theoretically that momentum and Adam-type optimizers can exploit larger optimal batches and further reduce the minimum number of steps needed for nonconvex optimization than can the stochastic gradient descent optimizer.

翻译：最近,对非康维克斯优化的深学习优化的趋同率和趋同率分析进行了广泛研究;同时,优化的定量评价准确地澄清了批量规模与培训深神经网络所需步骤数量之间的关系;本文件的主要贡献是理论上表明,每个优化优化的不康维克斯所需步骤数量可以作为批量规模的合理函数来表示。有了这些合理功能,可以得出两个特别重要的事实,这些事实在以前的研究中经过数字验证。第一个事实是,存在一个最佳批量规模,使非康维克斯优化所需的步骤数量最小化。这意味着使用比最佳批量规模更大的批量规模不会减少非康维克斯优化所需的步骤数量。第二个事实是,最佳批量规模取决于优化。特别是,从理论上看,动力和亚当式优化器可以利用更大的最佳批量,并进一步减少非康维克斯优化所需的最低步骤数量,而不能将梯分层梯层优化器进一步减少。

0

相关内容

优化器

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

低秩稀疏矩阵优化问题的模型与算法

专知会员服务

46+阅读 · 2020年7月29日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

专知会员服务

103+阅读 · 2020年6月28日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

67+阅读 · 2019年11月10日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adam那么棒，为什么还对SGD念念不忘

Adam那么棒，为什么还对SGD念念不忘

人工智能头条

6+阅读 · 2018年3月25日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

关关的刷题日记13——Leetcode 414. Third Maximum Number

关关的刷题日记13——Leetcode 414. Third Maximum Number

专知

3+阅读 · 2017年10月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Arxiv

0+阅读 · 2021年10月18日

Improving reinforcement learning algorithms: towards optimal learning rate policies

Arxiv

0+阅读 · 2021年10月17日

Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

Arxiv

0+阅读 · 2021年10月15日

Gaussian Process Bandit Optimization with Few Batches

Arxiv

0+阅读 · 2021年10月15日

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

Arxiv

0+阅读 · 2021年10月2日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Why Do Local Methods Solve Nonconvex Problems?

Arxiv

12+阅读 · 2021年3月24日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

低秩稀疏矩阵优化问题的模型与算法

专知会员服务

46+阅读 · 2020年7月29日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

专知会员服务

103+阅读 · 2020年6月28日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

67+阅读 · 2019年11月10日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adam那么棒，为什么还对SGD念念不忘

Adam那么棒，为什么还对SGD念念不忘

人工智能头条

6+阅读 · 2018年3月25日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

关关的刷题日记13——Leetcode 414. Third Maximum Number

关关的刷题日记13——Leetcode 414. Third Maximum Number

专知

3+阅读 · 2017年10月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Arxiv

0+阅读 · 2021年10月18日

Improving reinforcement learning algorithms: towards optimal learning rate policies

Arxiv

0+阅读 · 2021年10月17日

Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

Arxiv

0+阅读 · 2021年10月15日

Gaussian Process Bandit Optimization with Few Batches

Arxiv

0+阅读 · 2021年10月15日

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

Arxiv

0+阅读 · 2021年10月2日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Why Do Local Methods Solve Nonconvex Problems?

Arxiv

12+阅读 · 2021年3月24日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

微信扫码咨询专知VIP会员