Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, which has also been adopted recently to study generalization in minimax ones, fails in simple examples. We thus propose a new metric to study generalization of minimax learners: the primal gap, defined as the difference between the primal risk and its minimum over all models, to circumvent the issues. Next, we derive generalization error bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization error bounds for primal risk and primal-dual risk, another existing metric that is only well-defined when the global saddle-point exists, in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms -- gradient descent-ascent (GDA) and gradient descent-max (GDMax) in stochastic minimax optimization.
翻译:最微质量优化是许多机器学习(ML)问题的基础。 虽然优化算法的趋同行为在迷你马克思设置中得到了广泛研究, 但这些算法的趋同行为在迷你马克思设置中也得到了广泛研究, 但这些算法在迷你马克思最优化问题中的普及性保障却在简单的例子中失败。 我们因此提出一个新的衡量标准,研究迷你马克思最低优化问题的一般化问题, 即,如何在隐蔽测试数据中进行实验性数据分析的解决方案, 相对而言, 相对而言, 探索得较少。 一个根本的问题仍然是: 研究迷你马克思学习者一般化问题(MMMMML)的趋同行为。 研究迷你马克思最差的解决方案是什么, 研究迷你马克思学习者一般化问题的最佳衡量标准是什么? 下一步,我们得出关于迷你算法背景环境的原始差距的概括性误差。 作为我们分析的副产品, 我们还解决了两个开放的问题: 在最小型算法中, 最接近的离差的离差的离差值, 和最接近的离差的离差值 。