When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all players to better understand the basic actions and their effects. Building on this intuition, this paper introduces {\it progressive hiding}, an algorithm that learns to play imperfect information games by first learning the basic mechanics and then progressively adding information constraints over time. Progressive hiding is inspired by methods from stochastic multistage optimization such as scenario decomposition and progressive hedging. We prove that it enables the adaptation of counterfactual regret minimization to games where perfect recall is not satisfied. Numerical experiments illustrate that progressive hiding can achieve optimal payoff in a benchmark of emergent communication trading game.
翻译:暂无翻译