具有动力的斯托克优化:趋同、波动和避免陷阱 (Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance) - 专知论文

会员服务 ·

0

优化器 · 几乎必然收敛 · Nesterov加速梯度 · 局部极大值 · 评论员 ·

2021 年 4 月 16 日

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

翻译：具有动力的斯托克优化:趋同、波动和避免陷阱

A. Barakat,P. Bianchi,W. Hachem,Sh. Schechtman

from arxiv, 46 pages

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.

翻译：本文研究了一般随机优化程序,统一了随机梯度下降的若干变种,例如重球法、Stochacistic Nesterov加速梯度算法(S-NAG)和广泛使用的亚当算法。这一算法被视为非自主普通差分方的杂音分解,最近由Belotto da Silva和Gazeau采用,对此进行了深入分析。假设客观功能是不可凝固和可变的,迭代国与一组临界点的稳定性和几乎肯定的汇合已经确立。一个值得注意的特殊案例是,S-NAG在非convex环境下的趋同证据。根据一些假设,这种趋同率是以中央限制理论的形式提供的。最后,算法与不受欢迎的临界点(如本地峰值或马鞍点)不相容。在这里,主要成份是避免非自治环境的陷阱结果,这是独立的兴趣。

0

相关内容

优化器

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Arxiv

0+阅读 · 2021年6月10日

LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums

Arxiv

0+阅读 · 2021年6月10日

The Heavy-Tail Phenomenon in SGD

The Heavy-Tail Phenomenon in SGD

Arxiv

0+阅读 · 2021年6月8日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年6月7日

Decentralized Optimization with Heterogeneous Delays: a Continuous-Time Approach

Arxiv

0+阅读 · 2021年6月7日

Optimization Induced Equilibrium Networks

Arxiv

0+阅读 · 2021年6月7日

Difference methods for time discretization of stochastic wave equation

Arxiv

0+阅读 · 2021年6月7日

Federated Accelerated Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月5日

BiFair: Training Fair Models with Bilevel Optimization

Arxiv

0+阅读 · 2021年6月3日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

VIP会员

文章信息

相关主题

几乎必然收敛

Nesterov加速梯度

局部极大值

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Arxiv

0+阅读 · 2021年6月10日

LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums

Arxiv

0+阅读 · 2021年6月10日

The Heavy-Tail Phenomenon in SGD

The Heavy-Tail Phenomenon in SGD

Arxiv

0+阅读 · 2021年6月8日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年6月7日

Decentralized Optimization with Heterogeneous Delays: a Continuous-Time Approach

Arxiv

0+阅读 · 2021年6月7日

Optimization Induced Equilibrium Networks

Arxiv

0+阅读 · 2021年6月7日

Difference methods for time discretization of stochastic wave equation

Arxiv

0+阅读 · 2021年6月7日

Federated Accelerated Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月5日

BiFair: Training Fair Models with Bilevel Optimization

Arxiv

0+阅读 · 2021年6月3日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

微信扫码咨询专知VIP会员