Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training. Albeit its desirable simplicity, recent work shows inferior convergence rates of GDA in theory even assuming strong concavity of the objective on one side. This paper establishes new convergence results for two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- under the mild assumption that the objective satisfies the Polyak-Lojasiewicz (PL) condition about one variable. We prove that, to find an $\epsilon$-stationary point, (i) alternating GDA and its stochastic variant (without mini batch) respectively require $O(\kappa^{2} \epsilon^{-2})$ and $O(\kappa^{4} \epsilon^{-4})$ iterations, while (ii) smoothed GDA and its stochastic variant (without mini batch) respectively require $O(\kappa \epsilon^{-2})$ and $O(\kappa^{2} \epsilon^{-4})$ iterations. The latter greatly improves over the vanilla GDA and gives the hitherto best known complexity results among single-loop algorithms under similar settings. We further showcase the empirical efficiency of these algorithms in training GANs and robust nonlinear regression.
翻译:GDA是非Convex微型最大优化的最简单的单环算法(GDA),在基因对抗网络(GANs)和对抗性培训等实际应用中广泛使用。尽管它比较简单,但最近的工作显示GDA在理论上的趋同率较低,即使假设目标具有很强的混杂性,理论上也假定GDA的趋同率较高。本文为两种替代的单环算法(交替GDA和平滑的GDA)确立了新的趋同结果,其假设是,目标满足了Polyak-Lojasiewicz(PL)关于一个变数的变数。我们证明,为了找到一个eepslonal-alate staffergy(GADA)及其变数(没有小批量)的固定点,(交替的)GDA及其变数(没有小批)分别需要$O(kapa) 2} 和 $(kapappaqal_) lax Vanqal_qal train train) 和(不小批)这些变数的变数(O\\qal_)分别需要O\\\qualisal_qual_qual_qual_qual_qal_Gqal_qual_) lax laxxxxx pral) pralisal) pralisalisalisalmax pral_这些变数,这些变数,这些变数。