Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in minimax settings, their generalization guarantees in the stochastic setting, i.e., how the solution trained on empirical data performs on the unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization, fails in simple examples of minimax problems. Furthermore, another popular metric, the primal-dual risk, also fails to characterize the generalization behavior for minimax problems with nonconvexity, due to non-existence of saddle points. We thus propose a new metric to study generalization of minimax learners: the primal gap, to circumvent these issues. Next, we derive generalization bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization bounds for primal risk and primal-dual risk in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms -- gradient descent-ascent (GDA) and gradient descent-max (GDMax) in stochastic minimax optimization.
翻译:虽然优化算法的趋同行为在迷你马克思环境下得到了广泛研究,但是在随机环境中的普及性保障,即,如何在隐蔽测试数据上对经验数据进行演化的解决方案相对而言没有得到充分探讨。一个根本问题仍然难以探讨:研究迷你马克思学习者一般化的好衡量标准是什么?在本文中,我们的目标是通过首先显示原始风险,即研究在最小化方面的通用指标,在微缩问题中无法找到简单的例子。此外,另一个流行的衡量法,即在随机化环境中的普及性保证,即,由于不存在齿轮点,如何在对隐蔽数据进行实验数据分析时,如何对微缩算法问题的一般化行为进行概括性分析。因此,我们提出一个新的衡量标准,研究小型学习者的一般化:初等差距,绕过这些问题。接下来,我们得出在非混凝固化环境中的原始差距的概括性界限。作为我们分析的产物,另一个流行性指标,即原始-初步的更精确性-我们也可以在不精确的预估测中,先先先变的预测,先变。