Minimax optimization plays an important role in many machine learning tasks such as generative adversarial networks (GANs) and adversarial training. Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting where the data is distributed on multiple workers. Meanwhile, the existing decentralized minimax optimization methods rely on the strictly assumptions such as (strongly) concavity and variational inequality conditions. In the paper, thus, we propose an efficient decentralized momentum-based gradient descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax optimization, which is nonconvex in primal variable and is nonconcave in dual variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular, our DM-GDA method simultaneously uses the momentum-based techniques to update variables and estimate the stochastic gradients. Moreover, we provide a solid convergence analysis for our DM-GDA method, and prove that it obtains a near-optimal gradient complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of the nonconvex-PL stochastic minimax problems, which reaches the lower bound of nonconvex stochastic optimization. To the best of our knowledge, we first study the decentralized algorithm for Nonconvex-PL stochastic minimax optimization over a network.
翻译:极小极大优化在许多机器学习任务中都扮演重要角色,比如生成对抗网络(GANs)和对抗性训练。尽管最近已提出了广泛的优化方法来解决极小化问题,但大多数方法忽略了数据分布在多个工作站上的分布式环境。同时,现有的去中心化极小极大优化方法依赖于严格的假设,如(强)凸性和变分不等式条件。因此,在这篇论文中,我们提出了一种有效的分布式动量梯度下降上升(DM-GDA)方法,用于分布式非凸PL极小极大优化,该问题在原始变量中是非凸的,在对偶变量中是非凸的,并满足Polyak-Lojasiewicz(PL)条件。特别地,我们的DM-GDA方法同时使用基于动量的技术来更新变量和估计随机梯度。此外,我们对DM-GDA进行了可靠的收敛性分析,并证明它对于寻找非凸PL随机极小极大问题的ε稳定解具有接近最优的梯度复杂度O(ε-3),可以达到非凸随机优化的下界。据我们所知,我们首次研究了关于网络的非凸PL随机极小极大优化的去中心化算法。