Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning models built from training examples would behave on test examples. In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconcave cases through the lens of algorithmic stability. We establish a quantitative connection between stability and several generalization measures both in expectation and with high probability. For the convex-concave setting, our stability analysis shows that stochastic gradient descent ascent attains optimal generalization bounds for both smooth and nonsmooth minimax problems. We also establish generalization bounds for both weakly-convex-weakly-concave and gradient-dominated problems.
翻译:许多机器学习问题可以被描述为小型问题,如基因反转网络(GANs)、AUC最大化和强力估计,仅举几个例子。大量研究致力于研究其随机梯度型算法的趋同行为。相反,关于这些算法的概括化工作相对较少,即从培训实例中建立起来的学习模型如何在试验实例上发挥作用。在本文件中,我们通过算法稳定性的透镜,全面分析在锥形和非锥体-非锥体-非锥体-锥体案例下对微型问题采用的随机梯度方法。我们在预期和极有可能的情况下,在稳定性和若干一般化措施之间建立了定量联系。对于锥形-锥体-锥体环境,我们的稳定分析表明,从显性梯度梯度下降获得最优的光滑和非摩擦微型问题的一般化界限。我们还通过算法稳定化和梯度定的问题建立了一般化界限。