Minimmax 优化化的交替渐变后代- 中心近于最佳的本地融合 (Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization) - 专知论文

会员服务 ·

0

优化器 · GAN · Integration · 平滑 · 约束 ·

2021 年 5 月 31 日

Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization

翻译：Minimmax 优化化的交替渐变后代- 中心近于最佳的本地融合

Guodong Zhang,Yuanhao Wang,Laurent Lessard,Roger Grosse

Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice for many applications (e.g., GAN training), the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart (Sim-GDA) in many settings. In particular, we prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the \emph{first} result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further prove that the acceleration effect of alternating updates remains when the minimax problem has only strong concavity in the dual variables. Lastly, we adapt the theory of integral quadratic constraints and show that Alt-GDA attains the same rate \emph{globally} for a class of SCSC minimax problems. Numerical experiments on quadratic minimax games validate our claims. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms.

翻译：平滑的小型游戏通常通过同步或交替的梯度更新来进行。虽然交替更新的算法在很多应用程序(例如GAN培训)中通常使用, 但大部分现有的理论分析都侧重于同时算法, 以便于分析。在本文中, 我们研究在迷你游戏中交替的梯度下游升( Alt- GDA), 并显示Alt- GDA 在许多环境中优于同时对等( Sim- GDA ) 。特别是, 我们证明, Alt- GDA 实现了近乎最佳的本地趋同率( SC ), 以强烈的调和强烈的调和( SC SC ) ( SC ) 问题, 而 Sim- GDA 以慢速率聚合。根据我们的知识, 这是任何设置的结果, Alt- GDA 比 SMA 更快的趋同速度, 我们进一步证明, 当微缩成交更新的问题只在双重变量中具有强烈的共性硬度。最后, 我们调整了整体的四重度约束理论理论, 并显示, 我们的GDA AL- GDEA 类的升级将达到 AS ASlevill AL AS 。

0

相关内容

优化器

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

专知会员服务

229+阅读 · 2020年6月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

已删除

将门创投

4+阅读 · 2017年7月7日

Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent

Arxiv

0+阅读 · 2021年7月21日

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Arxiv

0+阅读 · 2021年7月19日

A Nonconvex Framework for Structured Dynamic Covariance Recovery

Arxiv

0+阅读 · 2021年7月18日

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Arxiv

0+阅读 · 2021年7月18日

Projection Robust Wasserstein Distance and Riemannian Optimization

Arxiv

0+阅读 · 2021年7月17日

On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms

Arxiv

0+阅读 · 2021年7月17日

Aggregating estimates by convex optimization

Arxiv

0+阅读 · 2021年7月16日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

专知会员服务

229+阅读 · 2020年6月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

已删除

将门创投

4+阅读 · 2017年7月7日

相关论文

Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent

Arxiv

0+阅读 · 2021年7月21日

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Arxiv

0+阅读 · 2021年7月19日

A Nonconvex Framework for Structured Dynamic Covariance Recovery

Arxiv

0+阅读 · 2021年7月18日

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Arxiv

0+阅读 · 2021年7月18日

Projection Robust Wasserstein Distance and Riemannian Optimization

Arxiv

0+阅读 · 2021年7月17日

On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms

Arxiv

0+阅读 · 2021年7月17日

Aggregating estimates by convex optimization

Arxiv

0+阅读 · 2021年7月16日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员