In general, two-agent decision-making problems can be modeled as a two-player game, and a typical solution is to find a Nash equilibrium in such game. Counterfactual regret minimization (CFR) is a well-known method to find a Nash equilibrium strategy in a two-player zero-sum game with imperfect information. The CFR method adopts a regret matching algorithm iteratively to reduce regret values progressively, enabling the average strategy to approach a Nash equilibrium. Although CFR-based methods have achieved significant success in the field of imperfect information games, there is still scope for improvement in the efficiency of convergence. To address this challenge, we propose a novel CFR-based method named exponential counterfactual regret minimization (ECFR). With ECFR, an exponential weighting technique is used to reweight the instantaneous regret value during the process of iteration. A theoretical proof is provided to guarantees convergence of the ECFR algorithm. The result of an extensive set of experimental tests demostrate that the ECFR algorithm converges faster than the current state-of-the-art CFR-based methods.
翻译:一般而言,双试剂决策问题可以模拟成双玩游戏,典型的解决办法是在这种游戏中找到纳什平衡。反事实遗憾最小化(CFR)是在信息不完善的双玩零和游戏中找到纳什平衡战略的著名方法。CFR方法采用遗憾匹配算法,迭代减少遗憾值,使平均战略能够接近纳什平衡。虽然基于CFR的方法在不完善的信息游戏领域取得了巨大成功,但在趋同效率方面仍有改进的余地。为了应对这一挑战,我们提出了一种基于CFR的新方法,名为指数反事实遗憾最小化(ECFR)。与ECFR一起,使用指数加权技术在循环过程中对瞬时的遗憾值进行重新加权。提供了理论证据,以保证ECFR算法的趋同。一系列广泛的实验的结果是,ECFR算法比目前最先进的CFR法方法趋同得快。