An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function that is itself defined as a maximum. Carrying a loop of stochastic gradient ascent (SGA) steps on the (inner) maximization problem, followed by an SGD step on the (outer) minimization, is known as Epoch Stochastic Gradient \textit{Descent Ascent} (ESGDA). While successful in practice, the theoretical analysis of ESGDA remains challenging, with no clear guidance on choices for the inner loop size nor on the interplay between inner/outer step sizes. We propose RSGDA (Randomized SGDA), a variant of ESGDA with stochastic loop size with a simpler theoretical analysis. RSGDA comes with the first (among SGDA algorithms) almost sure convergence rates when used on nonconvex min/strongly-concave max settings. RSGDA can be parameterized using optimal loop sizes that guarantee the best convergence rates known to hold for SGDA. We test RSGDA on toy and larger scale problems, using distributionally robust optimization and single-cell data matching using optimal transport as a testbed.
翻译:越来越多的机器学习问题,如现有算法的稳健或对抗变体等,要求尽量减少本身被界定为最大范围的损失功能。在(内)最大化问题上环绕随机梯度梯度升升(SGA)步骤,然后在(外)最小化方面采取SGD步骤(SGD步骤),称为Epoch 慢速梯度梯度梯度梯度梯度梯度(SGDA),称为(外)最小化(ESGDA) 。尽管在实践中取得了成功,但ESGDA的理论分析仍然具有挑战性,对于内部环形大小的选择和内/外步骤大小之间的相互作用没有明确的指导。我们提议RSGDA(随机化的SGDA),这是ESGDA的变式梯度梯度梯度梯度梯度梯度梯度,其理论分析更为简单。 RSGDA与第一个(在SGDA算法中)几乎可以肯定在使用非 convex min/strong-conculate-concive concive concave macrefer confer situde 设置时使用的趋同规模,我们可以使用最佳循环大小来保证SGDA所知道的最佳趋同率率率。我们用一个最稳和最强的模模模模模模化的模范的模模模模模模化的模模模模版数据,我们试制,我们试比,我们测试了SGDA。