Adaptive gradient methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner, and empirically achieve faster convergence for solving minimization problems. When it comes to nonconvex minimax optimization, however, current convergence analyses of gradient descent ascent (GDA) combined with adaptive stepsizes require careful tuning of hyper-parameters and the knowledge of problem-dependent parameters. Such a discrepancy arises from the primal-dual nature of minimax problems and the necessity of delicate time-scale separation between the primal and dual updates in attaining convergence. In this work, we propose a single-loop adaptive GDA algorithm called TiAda for nonconvex minimax optimization that automatically adapts to the time-scale separation. Our algorithm is fully parameter-agnostic and can achieve near-optimal complexities simultaneously in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems. The effectiveness of the proposed method is further justified numerically for a number of machine learning applications.
翻译:适应性梯度方法表明它们有能力以参数-不可知的方式调整苍蝇上的阶梯,并用经验更快地实现趋同,以解决最小化问题。然而,在非convex微型最大优化方面,目前对梯度下降率(GDA)的趋同分析,加上适应性梯度,需要仔细调整超参数和对问题参数的了解。这种差异来自小型最大问题的初步两面性质,以及将原始更新和双重更新之间微妙的时间尺度分离以达到趋同的必要性。在这项工作中,我们建议采用一个叫做TiAda的非cavex小型最大优化的单环形GDA算法,该算法可自动适应时间尺度的分离。我们的算法是完全参数-敏感的,可以在非凝固型凝固的最小化问题中同时实现近于最优化的复杂情况。对于一些机器学习应用而言,拟议方法的有效性在数字上更为合理。