Deep Reinforcement Learning combined with Fictitious Play shows impressive results on many benchmark games, most of which are, however, single-stage. In contrast, real-world decision making problems may consist of multiple stages, where the observation spaces and the action spaces can be completely different across stages. We study a two-stage strategy card game Legends of Code and Magic and propose an end-to-end policy to address the difficulties that arise in multi-stage game. We also propose an optimistic smooth fictitious play algorithm to find the Nash Equilibrium for the two-player game. Our approach wins double championships of COG2022 competition. Extensive studies verify and show the advancement of our approach.
翻译:深入强化学习与法西斯游戏相结合的深入强化学习显示,许多基准游戏取得了令人印象深刻的结果,但多数是单阶段的。相反,现实世界的决策问题可能由多个阶段组成,其中观察空间和动作空间可以在各个阶段完全不同。我们研究了一个两阶段的战略卡片游戏“代码和魔术传奇”并提出一个端对端政策,以解决多阶段游戏中出现的困难。我们还提议了一个乐观的、顺畅的假游戏算法,为两玩游戏找到Nash Equildicrium。我们的方法赢得了COG2022竞赛的双赢。广泛的研究核实并展示了我们方法的进步。</s>