Limited look-ahead game solving for imperfect-information games is the breakthrough that allowed defeating expert humans in large poker. The existing algorithms of this type assume that all players are perfectly rational and do not allow explicit modeling and exploitation of the opponent's flaws. As a result, even very weak opponents can tie or lose only very slowly against these powerful methods. We present the first algorithm that allows incorporating opponent models into limited look-ahead game solving. Using only an approximation of a single (optimal) value function, the algorithm efficiently exploits an arbitrary estimate of the opponent's strategy. It guarantees a bounded worst-case loss for the player. We also show that using existing resolving gadgets is problematic and why we need to keep the previously solved parts of the game. Experiments on three different games show that over half of the maximum possible exploitation is achieved by our algorithm without risking almost any loss.
翻译:解决不完善信息游戏的有限目光型游戏是突破性,它使得在大型扑克游戏中击败专家人类成为了突破。这种类型的现有算法假定所有玩家都是完全理性的,不允许对对手的缺陷进行明确的建模和利用。因此,即使非常弱的对手也可以与这些强大的方法相联,或者只是缓慢地失去。我们提出了第一个允许将对手模型纳入有限的目光型游戏解决的算法。这种算法只近似于单一(最优)值功能,有效地利用了对对手策略的任意估计。它保证了玩家遭受了最坏的损失。我们还表明,利用现有的解决工具是有问题的,我们为什么需要保留先前解决的游戏部分。在三个不同的游戏上进行的实验显示,我们算法所实现的最大可能的利用量有一半以上是不会造成几乎任何损失的。