用于最小问题的最佳多阶段多阶段碎片梯度方法 (An Optimal Multistage Stochastic Gradient Method for Minimax Problems) - 专知论文

会员服务 ·

0

优化器 · 噪声 · 学习率衰减 · 线性的 · 随机梯度下降 ·

2020 年 2 月 13 日

An Optimal Multistage Stochastic Gradient Method for Minimax Problems

翻译：用于最小问题的最佳多阶段多阶段碎片梯度方法

Alireza Fallah,Asuman Ozdaglar,Sarath Pattathil

In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the convergence rate and the size of this neighborhood. Next, we propose a multistage variant of stochastic GDA (M-GDA) that runs in multiple stages with a particular learning rate decay schedule and converges to the exact solution of the minimax problem. We show M-GDA achieves the lower bounds in terms of noise dependence without any assumptions on the knowledge of noise characteristics. We also show that M-GDA obtains a linear decay rate with respect to the error's dependence on the initial error, although the dependence on condition number is suboptimal. In order to improve this dependence, we apply the multistage machinery to the stochastic Optimistic Gradient Descent Ascent (OGDA) algorithm and propose the M-OGDA algorithm which also achieves the optimal linear decay rate with respect to the initial error. To the best of our knowledge, this method is the first to simultaneously achieve the best dependence on noise characteristic as well as the initial error and condition number.

翻译：在本文中,当我们有机会获得对梯度的杂音估计时,我们研究在平滑和强烈混凝固的细微最大优化环境中的细微最大优化问题。特别是,我们首先以不断的步步化方式分析微散渐渐渐下降源加速度(GDA)方法(GDA),并显示该方法与迷微问题的解决办法相趋近。我们进一步提供了对趋同率和这一附近地区大小的严格界限。接着,我们提议了一个多阶段的Stochectic GDA(M-GDA)变式,该变式在多个阶段,有一个特定的学习速率衰减时间表,与微缩问题的确切解决办法趋同。我们同时显示M-GDA在噪音依赖度方法方面达到了较低的界限,而没有假定对噪音特性特性的了解。我们还表明,M-GDA在错误对初始误差率方面获得了线性衰减率,尽管对条件号的依赖性不尽。为了改进这种依赖性,我们把多阶段机械应用到先诊断式的多位机制,将精准性偏向性差衰变后,并接近于最接近于最精确的ASTentlegradicregradugradegradegradegrademaximal 也提出了最佳的模型,这是最佳的对最佳的测法。

0

相关内容

优化器

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Arxiv

4+阅读 · 2019年3月7日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Arxiv

4+阅读 · 2018年7月19日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Improved Image Captioning via Policy Gradient optimization of SPIDEr

Arxiv

6+阅读 · 2018年3月12日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

学习率衰减

随机梯度下降

相关VIP内容

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Arxiv

4+阅读 · 2019年3月7日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Arxiv

4+阅读 · 2018年7月19日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Improved Image Captioning via Policy Gradient optimization of SPIDEr

Arxiv

6+阅读 · 2018年3月12日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员