In this paper, we investigate the power of regularization, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ last-iterate convergence in terms of duality gap without the uniqueness assumption of the Nash equilibrium (NE). Moreover, regularized dilated optimistic multiplicative weights update (Reg-DOMWU), an instance of Reg-DOMD, further enjoys the $\tilde O(1/T)$ last-iterate convergence rate of the distance to the set of NE. This addresses an open question on whether iterate convergence can be obtained for OMWU algorithms without the uniqueness assumption in both the EFG and normal-form game literature. Second, we show that regularized counterfactual regret minimization (Reg-CFR), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that Reg-CFR can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the SOTA average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.
翻译:在本文中,我们调查了正规化的力量,一种常见的强化学习和优化技术,一种解决广泛形式游戏(EFGs)的常规化学习和优化技术。我们提出了一系列基于游戏报酬功能正规化的新算法。我们提出了一系列基于游戏报酬功能常规化的新算法,并建立了一套趋同结果,严格改进于现有的游戏,有较弱的假设或较强的趋同保证。特别是,我们首先展示了扩大的乐观镜底下降率(DOMD),这是用于解决EFGs的一个高效的变异体,适应性调整可以实现快速的美元比值O(O1/T)的趋同率趋同,没有纳什平衡(NEBC3)的独特假设。此外,常规的乐观性多倍增权重更新(Reg-DOMWU),这是Reg-DOMD进一步享受到美元O(1/T)的最后一比值的趋同率。这解决了O-rental-ral-rational-rational dalalal dal dal dal deal deal deal deal deal deal deal dreal deal deal deal deal deal deal deal deal deal deal deal deal demode,我们可以取得最甚的成绩。