We study the smooth minimax optimization problem of the form $\min_{\bf x}\max_{\bf y} f({\bf x},{\bf y})$, where the objective function is strongly-concave in ${\bf y}$ but possibly nonconvex in ${\bf x}$. This problem includes a lot of applications in machine learning such as regularized GAN, reinforcement learning and adversarial training. Most of existing theory related to gradient descent accent focus on establishing the convergence result for achieving the first-order stationary point of $f({\bf x},{\bf y})$ or primal function $P({\bf x})\triangleq \max_{\bf y} f({\bf x},{\bf y})$. In this paper, we design a new optimization method via cubic Newton iterations, which could find an ${\mathcal O}\left(\varepsilon,\kappa^{1.5}\sqrt{\rho\varepsilon}\right)$-second-order stationary point of $P({\bf x})$ with ${\mathcal O}\left(\kappa^{1.5}\sqrt{\rho}\varepsilon^{-1.5}\right)$ second-order oracle calls and $\tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\right)$ first-order oracle calls, where $\kappa$ is the condition number and $\rho$ is the Hessian smoothness coefficient of $f({\bf x},{\bf y})$. For high-dimensional problems, we propose an variant algorithm to avoid expensive cost form second-order oracle, which solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains desired approximate second-order stationary point with high probability but only requires $\tilde{\mathcal O}\left(\kappa^{1.5}\ell\varepsilon^{-2}\right)$ Hessian-vector oracle and $\tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\right)$ first-order oracle calls. To the best of our knowledge, this is the first work considers non-asymptotic convergence behavior of finding second-order stationary point for minimax problem without convex-concave assumption.


翻译:我们研究的是美元( minimax) 平滑的迷你优化问题 $( maxxxxxxxxxxxxxxbf y}) 美元, 目标函数以$( bfxxxx美元) 强烈拼凑, 但可能不是convex $( bfxxxxxx美元) 。 这个问题包括在机器学习中的许多应用, 如常规化 GAN、 强化学习和对抗性培训。 与渐渐下降口音相关的现有理论大多侧重于建立以下第一阶点的趋同结果: $( bxxxx美元, bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

0
下载
关闭预览

相关内容

专知会员服务
39+阅读 · 2021年8月20日
专知会员服务
50+阅读 · 2021年4月7日
报告 | 2020中国5G经济报告,100页pdf
专知会员服务
98+阅读 · 2019年12月29日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
60+阅读 · 2019年10月17日
图机器学习 2.2-2.4 Properties of Networks, Random Graph
图与推荐
10+阅读 · 2020年3月28日
已删除
将门创投
6+阅读 · 2019年6月10日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
17+阅读 · 2018年12月24日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
【学习】Hierarchical Softmax
机器学习研究会
4+阅读 · 2017年8月6日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Arxiv
0+阅读 · 2021年12月10日
Arxiv
3+阅读 · 2017年12月1日
VIP会员
Top
微信扫码咨询专知VIP会员