Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. The success of the method led to several advanced extensions of the classical SGDA, including variants with arbitrary sampling, variance reduction, coordinate randomization, and distributed variants with compression, which were extensively studied in the literature, especially during the last few years. In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities. A key to our unified framework is a parametric assumption on the stochastic estimates. Via our general theoretical framework, we either recover the sharpest known rates for the known special cases or tighten them. Moreover, to illustrate the flexibility of our approach we develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA). Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs. We also demonstrate the most important properties of the new methods through extensive numerical experiments.
翻译:在各种机器学习任务中,该方法的成功导致经典SGDA的几处高级扩展,包括任意抽样、减少差异、协调随机化的变异,以及压缩的变异,这些变异在文献中得到了广泛研究,特别是在过去几年中。在本文件中,我们提议进行统一趋同分析,包括大量各种随机性梯度梯度梯度梯度梯度梯度上升方法(L-SRGDA),这些方法迄今为止需要不同的直觉,有不同的应用,并且在不同社区分别开发。我们统一框架的一个关键是随机估计的参数假设。通过我们的一般理论框架,我们要么恢复已知特殊案例已知的最明显的速度,要么收紧这些特例。此外,为了说明我们的方法的灵活性,我们开发了SGDA的几种新变异性新变异性变异性方法(L-SVRGDA),采用压缩的新方法(QSGDA、DA-SG-SA最著名的解算方法,也通过我们所认识的MAA-SO-SU-SOA方法,这是我们所认识的解决的新的变式方法。