Counterfactual regret Minimization (CFR) is an effective algorithm for solving extensive games with imperfect information (IIEG). However, CFR is only allowed to apply in a known environment such as the transition functions of the chance player and reward functions of the terminal nodes are aware in IIEGs. For uncertain scenarios like the cases under Reinforcement Learning (RL), variational information maximizing exploration (VIME) provides a useful framework for exploring environments using information gain. In this paper, we propose a method named VCFR that combines CFR with information gain to calculate Nash Equilibrium (NE) in the scenario of IIEG under RL. By adding information gain to the reward, the average strategy calculated by CFR can be directly used as an interactive strategy, and the exploration efficiency of the algorithm to uncertain environments has been significantly improved. Experimentally, The results demonstrate that this approach can not only effectively reduce the number of interactions with the environment, but also find an approximate NE.
翻译:反事实遗憾最小化(CFR)是解决信息不完善的大游戏的有效算法(IIEG),然而,CFR只被允许在已知的环境中应用,例如机会玩家的过渡功能和终端节点的奖赏功能在IIEGs中已经意识到。对于诸如加强学习(RL)下的案例等不确定的情景,变异信息最大化探索(VIME)为利用信息收益探索环境提供了一个有用的框架。在本文中,我们提出了一个名为VCFR(VCFR)的方法,将CFR(CFR)与根据REG(IIEG)方案计算 Nash Equilibrium(NE)的信息收益结合起来。通过将信息收益添加到收益中,CFR计算的平均战略可以直接用作互动战略,并大大提高了对不确定环境的算法的探索效率。实验结果表明,这一方法不仅能够有效减少与环境的互动次数,而且还可以找到近似NE。