Regret matching (RM) -- and its modern variants -- is a foundational online algorithm that has been at the heart of many AI breakthrough results in solving benchmark zero-sum games, such as poker. Yet, surprisingly little is known so far in theory about its convergence beyond two-player zero-sum games. For example, whether regret matching converges to Nash equilibria in potential games has been an open problem for two decades. Even beyond games, one could try to use RM variants for general constrained optimization problems. Recent empirical evidence suggests that they -- particularly regret matching$^+$ (RM$^+$) -- attain strong performance on benchmark constrained optimization problems, outperforming traditional gradient descent-type algorithms. We show that RM$^+$ converges to an $ε$-KKT point after $O_ε(1/ε^4)$ iterations, establishing for the first time that it is a sound and fast first-order optimizer. Our argument relates the KKT gap to the accumulated regret, two quantities that are entirely disparate in general but interact in an intriguing way in our setting, so much so that when regrets are bounded, our complexity bound improves all the way to $O_ε(1/ε^2)$. From a technical standpoint, while RM$^+$ does not have the usual one-step improvement property in general, we show that it does in a certain region that the algorithm will quickly reach and remain in thereafter. In sharp contrast, our second main result establishes a lower bound: RM, with or without alternation, can take an exponential number of iterations to reach a crude approximate solution even in two-player potential games. This represents the first worst-case separation between RM and RM$^+$. Our lower bound shows that convergence to coarse correlated equilibria in potential games is exponentially faster than convergence to Nash equilibria.
翻译:后悔匹配(RM)及其现代变体是一种基础性的在线算法,在解决基准零和博弈(如扑克)的诸多人工智能突破性成果中处于核心地位。然而,令人惊讶的是,迄今为止,关于其在两人零和博弈之外的收敛性理论认知甚少。例如,后悔匹配在势博弈中是否收敛至纳什均衡,这一问题已悬而未决长达二十年。即便超越博弈范畴,人们亦可尝试将RM变体用于一般约束优化问题。近期实证证据表明,这些算法——特别是后悔匹配$^+$(RM$^+$)——在基准约束优化问题上展现出强劲性能,超越了传统的梯度下降型算法。我们证明,RM$^+$在$O_ε(1/ε^4)$次迭代后收敛至一个$ε$-KKT点,首次确立了其作为一种可靠且快速的一阶优化器的地位。我们的论证将KKT间隙与累积后悔相关联,这两个量在一般情况下全然不同,但在我们的设定中以一种引人入胜的方式相互作用,以至于当后悔有界时,我们的复杂度界可进一步提升至$O_ε(1/ε^2)$。从技术角度看,尽管RM$^+$在一般情况下不具备通常的单步改进性质,但我们证明其在算法将迅速到达并此后持续停留的特定区域内确实具有该性质。与之形成鲜明对比的是,我们的第二个主要结果确立了一个下界:即使在两人势博弈中,无论是否采用交替策略,RM算法都可能需要指数级次数的迭代才能达到一个粗略的近似解。这首次揭示了RM与RM$^+$在最坏情况下的性能分离。我们的下界表明,在势博弈中收敛至粗相关均衡的速度,比收敛至纳什均衡快指数级。