Counterfactual Regret Minimization (CFR) is a kind of regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets. CFRs have a fast convergence rate in practice and they have been widely used for solving large-scale imperfect-information Extensive-form games (EFGs). However, due to their locality, CFRs are difficult to analyze and extend. Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization. They are mathematically elegant but less practical in solving EFGs. In this paper, we provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD, respectively. With these equivalences, two new algorithms, which can be considered as the extensions of vanilla CFR and CFR+, are deduced from the perspective of FTRL and OMD. In these two variants, maintaining the local counterfactual regrets is not necessary anymore. The experiments show that the two variants converge faster than vanilla CFR and CFR+ in some EFGs.
翻译:减少对事实的遗憾是一种最遗憾的最小化算法(CFR),这种算法通过最大限度地减少当地反事实的遗憾而将全部遗憾降到最低。CFR在实践上具有快速的趋同率,并被广泛用于解决大规模不完善信息的广泛形式游戏(EFGs),然而,由于它们的地理位置,CFR很难分析和扩展。跟踪Regalized-Lead(FTRL)和在线镜源(OMD)的算法是在线Convex最佳化(OMD)的遗憾最小化算法。它们数学优雅,但在解决EFGs时不那么实用。在本文中,我们提供了一个分析和扩展CFR的新方法,证明CFR与Regret匹配和CFR与Regret Match+的CFR是分别为FTRL和OMD和OMD的特殊形式。有了这些等同,从FRL和CFR+的角度来看,两种新的算法从FRL和O的角度来看可以被视为最小化法的延伸。在这两种变式中,维持CFRFRFFR的更快的变式已经没有必要。