AdaGrad 步骤化:在非沿海地貌的高度趋同 (AdaGrad stepsizes: Sharp convergence over nonconvex landscapes) - 专知论文

会员服务 ·

0

AdaGrad · 非凸 · 随机梯度下降 · Extensibility · 稳健性 ·

2021 年 4 月 19 日

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes

翻译：AdaGrad 步骤化:在非沿海地貌的高度趋同

Rachel Ward,Xiaoxia Wu,Leon Bottou

Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine-tune the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing theoretical guarantees for the convergence of AdaGrad for smooth, nonconvex functions. We show that the norm version of AdaGrad (AdaGrad-Norm) converges to a stationary point at the $\mathcal{O}(\log(N)/\sqrt{N})$ rate in the stochastic setting, and at the optimal $\mathcal{O}(1/N)$ rate in the batch (non-stochastic) setting -- in this sense, our convergence guarantees are 'sharp'. In particular, the convergence of AdaGrad-Norm is robust to the choice of all hyper-parameters of the algorithm, in contrast to stochastic gradient descent whose convergence depends crucially on tuning the step-size to the (generally unknown) Lipschitz smoothness constant and level of stochastic noise on the gradient. Extensive numerical experiments are provided to corroborate our theory; moreover, the experiments suggest that the robustness of AdaGrad-Norm extends to state-of-the-art models in deep learning, without sacrificing generalization.

翻译：AdaGrad 及其变体等适应性梯度方法更新了根据沿途收到的梯度,在飞翔梯度梯度下降的阶梯化步骤;这些方法在大规模优化中得到了广泛使用,以便在不需要微调进度表的情况下,能够稳健地汇聚。然而,迄今为止AdaGrad 的理论保障是在线和 convex优化。我们通过为AdaGrad 的平稳、非Convex 功能的趋同提供理论保障来弥合这一差距。我们表明,AdaGrad(AdaGrad-Norm) 的常态版(AdaGrad-Norm) 已经逐渐接近于一个固定点,在 $\mathcal{O} (\log(N)/\sqrt{N}) 上,在StagradGrad(非Convelopic) 中,Adagrad-Nord-Norm (Adgrad-Norm) 的常态性模型的趋同性水平, 提供了更精确的更精确的更精确的更精确的比。

0

相关内容

AdaGrad

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

威斯康辛大学《机器学习导论》2020秋季课程完结，课件、视频资源已开放

专知会员服务

16+阅读 · 2020年12月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

普林斯顿大学19年春季学期《机器学习优化》课程讲义

普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知

12+阅读 · 2019年6月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

机器之心年度奖项Synced Machine Intelligence Awards正式发布

机器之心年度奖项Synced Machine Intelligence Awards正式发布

机器之心

3+阅读 · 2018年1月18日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Arxiv

0+阅读 · 2021年6月10日

LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums

Arxiv

0+阅读 · 2021年6月10日

Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness

Arxiv

0+阅读 · 2021年6月10日

Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

Arxiv

0+阅读 · 2021年6月10日

Convergence of parallel overlapping domain decomposition methods for the Helmholtz equation

Convergence of parallel overlapping domain decomposition methods for the Helmholtz equation

Arxiv

0+阅读 · 2021年6月9日

The Heavy-Tail Phenomenon in SGD

The Heavy-Tail Phenomenon in SGD

Arxiv

0+阅读 · 2021年6月8日

Navigating to the Best Policy in Markov Decision Processes

Arxiv

0+阅读 · 2021年6月5日

Why Do Local Methods Solve Nonconvex Problems?

Arxiv

12+阅读 · 2021年3月24日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

威斯康辛大学《机器学习导论》2020秋季课程完结，课件、视频资源已开放

专知会员服务

16+阅读 · 2020年12月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

普林斯顿大学19年春季学期《机器学习优化》课程讲义

普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知

12+阅读 · 2019年6月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

机器之心年度奖项Synced Machine Intelligence Awards正式发布

机器之心年度奖项Synced Machine Intelligence Awards正式发布

机器之心

3+阅读 · 2018年1月18日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Arxiv

0+阅读 · 2021年6月10日

LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums

Arxiv

0+阅读 · 2021年6月10日

Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness

Arxiv

0+阅读 · 2021年6月10日

Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

Arxiv

0+阅读 · 2021年6月10日

Convergence of parallel overlapping domain decomposition methods for the Helmholtz equation

Convergence of parallel overlapping domain decomposition methods for the Helmholtz equation

Arxiv

0+阅读 · 2021年6月9日

The Heavy-Tail Phenomenon in SGD

The Heavy-Tail Phenomenon in SGD

Arxiv

0+阅读 · 2021年6月8日

Navigating to the Best Policy in Markov Decision Processes

Arxiv

0+阅读 · 2021年6月5日

Why Do Local Methods Solve Nonconvex Problems?

Arxiv

12+阅读 · 2021年3月24日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员