PAG: 非convex 优化的简单和最佳概率梯度测算器 (PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization) - 专知论文

会员服务 ·

0

非凸 · 估计/估计量 · 优化器 · SimPLe · SGD ·

2020 年 10 月 13 日

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

翻译：PAG: 非convex 优化的简单和最佳概率梯度测算器

Zhize Li,Hongyan Bao,Xiangliang Zhang,Peter Richtárik

from arxiv, 29 pages

In this paper, we propose a novel stochastic gradient estimator---ProbAbilistic Gradient Estimator (PAGE)---for nonconvex optimization. PAGE is easy to implement as it is designed via a small adjustment to vanilla SGD: in each iteration, PAGE uses the vanilla minibatch SGD update with probability $p$ or reuses the previous gradient with a small adjustment, at a much lower computational cost, with probability $1-p$. We give a simple formula for the optimal choice of $p$. We prove tight lower bounds for nonconvex problems, which are of independent interest. Moreover, we prove matching upper bounds both in the finite-sum and online regimes, which establish that PAGE is an optimal method. Besides, we show that for nonconvex functions satisfying the Polyak-\L{}ojasiewicz (PL) condition, PAGE can automatically switch to a faster linear convergence rate. Finally, we conduct several deep learning experiments (e.g., LeNet, VGG, ResNet) on real datasets in PyTorch, and the results demonstrate that PAGE not only converges much faster than SGD in training but also achieves the higher test accuracy, validating our theoretical results and confirming the practical superiority of PAGE.

翻译：在本文中,我们提出了一个用于非电解优化的新型随机梯度估计估计值(PAGE)- 概率- 概率- 精度渐变模拟器(PAGE)- 用于非电解优化。 PAGE很容易实施,因为它的设计是通过对香草 SGD的小规模调整而设计的:在每次循环中,PAGE使用香草迷你球 SGD更新,概率为美元,或以小幅调整方式使用香草迷你球 SGD 更新,计算成本低得多,概率为1 p美元。我们给出了一个用于最佳选择$p$的简单公式。我们证明,对于非电解问题(PAGE),我们对独立感兴趣的问题,我们证明,在限定和在线制度中,我们匹配了上限价和在线机制的上限,这证明PAGEA是一种最佳的方法。此外,我们证明,对于非电流函数符合Policak-Lojasiewicz(PL) 条件, 自动转换为更快的线性融合率。最后,我们进行了几次深层次的学习实验(例如,LeNet、 VGGGEGE、ResNet、ResNet) 也显示实际GAGAGAGALAD的精确性测试结果比实际的精确度要好得多。

0

相关内容

《常微分方程》笔记，419页pdf

《常微分方程》笔记，419页pdf

专知会员服务

75+阅读 · 2020年8月2日

【硬核课】统计学习理论，321页ppt

【硬核课】统计学习理论，321页ppt

专知会员服务

140+阅读 · 2020年6月30日

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

专知会员服务

229+阅读 · 2020年6月5日

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

专知会员服务

250+阅读 · 2020年5月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【新书】Pro 机器学习算法Python实现，379页pdf

【新书】Pro 机器学习算法Python实现，379页pdf

专知会员服务

204+阅读 · 2020年2月11日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

275+阅读 · 2019年10月25日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年10月16日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

教程 | 用TensorFlow Estimator实现文本分类

教程 | 用TensorFlow Estimator实现文本分类

机器之心

4+阅读 · 2018年5月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Boosting One-Point Derivative-Free Online Optimization via Residual Feedback

Arxiv

0+阅读 · 2020年12月2日

DEMI: Discriminative Estimator of Mutual Information

Arxiv

0+阅读 · 2020年11月30日

Residual estimates for post-processors in elliptic problems

Arxiv

0+阅读 · 2020年11月29日

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

Arxiv

0+阅读 · 2020年11月26日

Deep Probabilistic Feature-metric Tracking

Arxiv

0+阅读 · 2020年11月25日

Computationally efficient likelihood inference in exponential families when the maximum likelihood estimator does not exist

Arxiv

0+阅读 · 2020年11月25日

Optimal and Maximin Procedures for Multiple Testing Problems

Arxiv

0+阅读 · 2020年11月25日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

《常微分方程》笔记，419页pdf

《常微分方程》笔记，419页pdf

专知会员服务

75+阅读 · 2020年8月2日

【硬核课】统计学习理论，321页ppt

【硬核课】统计学习理论，321页ppt

专知会员服务

140+阅读 · 2020年6月30日

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

专知会员服务

229+阅读 · 2020年6月5日

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

专知会员服务

250+阅读 · 2020年5月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【新书】Pro 机器学习算法Python实现，379页pdf

【新书】Pro 机器学习算法Python实现，379页pdf

专知会员服务

204+阅读 · 2020年2月11日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

275+阅读 · 2019年10月25日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年10月16日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

教程 | 用TensorFlow Estimator实现文本分类

教程 | 用TensorFlow Estimator实现文本分类

机器之心

4+阅读 · 2018年5月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Boosting One-Point Derivative-Free Online Optimization via Residual Feedback

Arxiv

0+阅读 · 2020年12月2日

DEMI: Discriminative Estimator of Mutual Information

Arxiv

0+阅读 · 2020年11月30日

Residual estimates for post-processors in elliptic problems

Arxiv

0+阅读 · 2020年11月29日

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

Arxiv

0+阅读 · 2020年11月26日

Deep Probabilistic Feature-metric Tracking

Arxiv

0+阅读 · 2020年11月25日

Computationally efficient likelihood inference in exponential families when the maximum likelihood estimator does not exist

Arxiv

0+阅读 · 2020年11月25日

Optimal and Maximin Procedures for Multiple Testing Problems

Arxiv

0+阅读 · 2020年11月25日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

微信扫码咨询专知VIP会员