PAG: 非convex 优化的简单和最佳概率梯度测算器 (PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization) - 专知论文

会员服务 ·

0

非凸 · 优化器 · 估计/估计量 · SimPLe · SGD ·

2021 年 6 月 11 日

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

翻译：PAG: 非convex 优化的简单和最佳概率梯度测算器

Zhize Li,Hongyan Bao,Xiangliang Zhang,Peter Richtárik

from arxiv, 25 pages; accepted by ICML 2021 (long talk)

In this paper, we propose a novel stochastic gradient estimator -- ProbAbilistic Gradient Estimator (PAGE) -- for nonconvex optimization. PAGE is easy to implement as it is designed via a small adjustment to vanilla SGD: in each iteration, PAGE uses the vanilla minibatch SGD update with probability $p_t$ or reuses the previous gradient with a small adjustment, at a much lower computational cost, with probability $1-p_t$. We give a simple formula for the optimal choice of $p_t$. Moreover, we prove the first tight lower bound $\Omega(n+\frac{\sqrt{n}}{\epsilon^2})$ for nonconvex finite-sum problems, which also leads to a tight lower bound $\Omega(b+\frac{\sqrt{b}}{\epsilon^2})$ for nonconvex online problems, where $b:= \min\{\frac{\sigma^2}{\epsilon^2}, n\}$. Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems. Besides, we also show that for nonconvex functions satisfying the Polyak-\L{}ojasiewicz (PL) condition, PAGE can automatically switch to a faster linear convergence rate $O(\cdot\log \frac{1}{\epsilon})$. Finally, we conduct several deep learning experiments (e.g., LeNet, VGG, ResNet) on real datasets in PyTorch showing that PAGE not only converges much faster than SGD in training but also achieves the higher test accuracy, validating the optimal theoretical results and confirming the practical superiority of PAGE.

翻译：在本文中, 我们提出一个新颖的随机梯度估计值, 概率为 1 p_ t$。我们给出了一个简单的公式, 用于最佳选择 $p_ t$ 。此外, 我们证明, 通过对 Vanilla SGD 进行小的调整来设计 PAG 很容易执行它: 在每次循环中, PAG 使用香草迷你球 SGD 更新, 概率为 $_ t$ 或再用小调整, 计算成本要低得多, 概率为 1 p_ t$ 。我们给出了一个简单的公式, 用于最佳选择 $p_ t$ 。此外, 我们证明, 最短的美元( safrac\ sqrt{ n eepslation2 sqlon2} 。用于非cional- slationalational dislationalated $ (b\\ lax) lax lax max modeal disal deal deal a $ dies.

0

相关内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

专知会员服务

229+阅读 · 2020年6月5日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

专知会员服务

46+阅读 · 2019年8月10日

凸优化及无约束最优化

凸优化及无约束最优化

AINLP

3+阅读 · 2019年2月15日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

已删除

将门创投

7+阅读 · 2018年11月5日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Yann LeCun爆惊人言论：深度学习已死？

Yann LeCun爆惊人言论：深度学习已死？

雷锋网

7+阅读 · 2018年1月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Lazy OCO: Online Convex Optimization on a Switching Budget

Lazy OCO: Online Convex Optimization on a Switching Budget

Arxiv

0+阅读 · 2021年8月13日

Computational complexity of Inexact Proximal Point Algorithm for Convex Optimization under Holderian Growth

Arxiv

0+阅读 · 2021年8月12日

Optimal error estimates of a second-order projection finite element method for magnetohydrodynamic equations

Arxiv

0+阅读 · 2021年8月12日

Dynamical Pose Estimation

Arxiv

0+阅读 · 2021年8月12日

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Arxiv

0+阅读 · 2021年8月11日

Fast Margin Maximization via Dual Acceleration

Arxiv

4+阅读 · 2021年7月1日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

专知会员服务

229+阅读 · 2020年6月5日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

【IJCAI 2019 Tutorials】基于概率图模型的医疗决策分析（Medical decision analysis with probabilistic graphical models）

专知会员服务

46+阅读 · 2019年8月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多模态数据存储与检索：综述

《算法战争研究计划全景评估》35页

【CMU博士论文】水下三维视觉感知与生成

智能体战争：自主人工智能军备竞赛全景透视

相关资讯

凸优化及无约束最优化

凸优化及无约束最优化

AINLP

3+阅读 · 2019年2月15日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

已删除

将门创投

7+阅读 · 2018年11月5日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Yann LeCun爆惊人言论：深度学习已死？

Yann LeCun爆惊人言论：深度学习已死？

雷锋网

7+阅读 · 2018年1月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Lazy OCO: Online Convex Optimization on a Switching Budget

Lazy OCO: Online Convex Optimization on a Switching Budget

Arxiv

0+阅读 · 2021年8月13日

Computational complexity of Inexact Proximal Point Algorithm for Convex Optimization under Holderian Growth

Arxiv

0+阅读 · 2021年8月12日

Optimal error estimates of a second-order projection finite element method for magnetohydrodynamic equations

Arxiv

0+阅读 · 2021年8月12日

Dynamical Pose Estimation

Arxiv

0+阅读 · 2021年8月12日

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Arxiv

0+阅读 · 2021年8月11日

Fast Margin Maximization via Dual Acceleration

Arxiv

4+阅读 · 2021年7月1日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员