Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.
翻译:与非 Convex 优化不同, 梯度下降可以保证与本地优化相融合, 而非colve- nonconcolve minimax优化的算法则则在结构学上可能具有不同的解决办法: 有时会与一个解决方案相融合, 有时从不相融合, 代之以一个极限周期, 有时会有所不同。 在本文中, 我们研究三种经典迷你运算法的局限性行为: 梯度下降率( GDA ), 交替梯度下降率( AGDA ), 和 异常方法( EGM ) 。 从数字上看, 我们观察到所有这些限制行为都可以在 General Aversarial 网络( GAN) 培训中出现, 并且很容易地展示出一系列 GAN 问题 。 为了解释这些不同的行为, 我们研究与每种算法相对应的高阶梯度分辨率连续时间的动态, 导致每种方法对本地趋同的足够( 几乎必要的) 条件。 此外, 通过ODE 观点可以让我们描述这些不同的限制行为之间的阶段过渡, 。