Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs $\widetilde{\mathcal{O}}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ realizations without this requirement by progressively adapting the regularization to the observations.
翻译:信息不便游戏 (IIG) 是每个玩家只部分观察当前游戏状态的游戏。 我们通过轨迹反馈, 研究如何在零和IIG 中学习$\ epsilon$- 最佳策略。 我们给出一个问题自发的低约束 $\ loblete_ mathcal{O} (H( A ⁇ mathcal{B ⁇ mathcal{Y)/\\2) 。 我们为这一设置建议了两种自发的低约束的 $\ lobaltude{( IIG) {( IIG) 的算法: 平衡的 FTRL, 与此低约束匹配, 但需要事先对信息设置结构的了解才能定义正规化; 适应性 FTRL, 需要$\\ philde{O} (H} (H) $_ mathcal{X} $\\\\\ mathcal{Y\\\\\\\\\\\\\ recal2) 逐步实现的常规观测。