Optimization of deep learning algorithms to approach Nash Equilibrium remains a significant problem in imperfect information games, e.g. StarCraft and poker. Neural Fictitious Self-Play (NFSP) has provided an effective way to learn approximate Nash Equilibrium without prior domain knowledge in imperfect information games. However, optimality gap was left as an optimization problem of NFSP and by solving the problem, the performance of NFSP could be improved. In this study, focusing on the optimality gap of NFSP, we have proposed a new method replacing NFSP's best response computation with regret matching method. The new algorithm can make the optimality gap converge to zero as it iterates, thus converge faster than original NFSP. We have conduct experiments on three typical environments of perfect-information games and imperfect information games in OpenSpiel and all showed that our new algorithm performances better than original NFSP.
翻译:优化深层次学习算法以接近纳什平衡仍然是不完善的信息游戏中的一个重大问题,例如StarCraft 和 扑克。 神经虚构自我游戏(NFSP)为在不完善的信息游戏中学习近似纳什平衡提供了一种有效的方法。 然而,最佳化差距作为NFSP的一个最佳化问题并解决了问题,NFSP的表现可以得到改善。 这项研究侧重于NFSP的最佳化差距,我们提出了一个新的方法,用遗憾匹配方法取代NFSP的最佳反应计算方法。 新的算法可以使最佳化差距在循环时达到零,从而比原始的NFSP更快地集中。 我们在OpenSpiel的三个典型环境中进行了完美信息游戏和不完善的信息游戏的实验,所有实验都表明我们的新算法表现比原化的NFSP要好。