By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among all existing double oracle methods, being only polynomial in $|S|$. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.
翻译:通过引入遗憾最小化的思想,双正则元方法已经被证明在标准型博弈和扩展型博弈中以非常快的速度(利用算法比如在线双正则元和扩展型双正则元)收敛于纳什均衡点。在本论文中,我们进一步考察这种基于遗憾最小化的双正则元方法的理论收敛速度和样本复杂度,使用一个被称为“遗憾最小化的双正则元法”的统一框架。在此框架下,我们将在线双正则元扩展到了扩展型博弈,并研究了它的样本复杂度。此外,我们发现,由于受到限制博弈指数级递减停止阈值的影响,扩展型双正则元的样本复杂度可能是指数级的。为解决这一问题,我们提出了周期双正则元方法(PDO),它在所有现有的双正则元方法中具有最低的样本复杂度,仅仅是$|S|$多项式级别。在多个扑克和棋类游戏上的实证评估表明,PDO比以前的双正则元算法收敛得更快,并且达到了现有基于遗憾最小化的最新算法的竞争水平。