Certain but important classes of strategic-form games, including zero-sum and identical-interest games, have the fictitious-play-property (FPP), i.e., beliefs formed in fictitious play dynamics always converge to a Nash equilibrium (NE) in the repeated play of these games. Such convergence results are seen as a (behavioral) justification for the game-theoretical equilibrium analysis. Markov games (MGs), also known as stochastic games, generalize the repeated play of strategic-form games to dynamic multi-state settings with Markovian state transitions. In particular, MGs are standard models for multi-agent reinforcement learning -- a reviving research area in learning and games, and their game-theoretical equilibrium analyses have also been conducted extensively. However, whether certain classes of MGs have the FPP or not (i.e., whether there is a behavioral justification for equilibrium analysis or not) remains largely elusive. In this paper, we study a new variant of fictitious play dynamics for MGs and show its convergence to an NE in n-player identical-interest MGs in which a single player controls the state transitions. Such games are of interest in communications, control, and economics applications. Our result together with the recent results in [Sayin et al. 2020] establishes the FPP of two-player zero-sum MGs and n-player identical-interest MGs with a single controller (standing at two different ends of the MG spectrum from fully competitive to fully cooperative).
翻译:某些但重要的战略形式游戏类别,包括零和和相同利益游戏,都具有虚拟游戏-财产(FPP)的虚拟游戏-财产(FPP),即以虚拟游戏动态形成的信念在游戏的反复游戏中总是会与纳什平衡(NE)趋同。这种趋同结果被视为游戏-理论平衡分析的(行为)理由。马可夫游戏(MGs)也称为随机游戏,它一般地将战略形式游戏的反复玩耍变成充满活力的多层游戏(FPP),特别是,MGs是多剂强化学习的标准模式 -- -- 一个在学习和游戏中恢复研究的领域,而他们的游戏-理论平衡分析也广泛进行。然而,某些MGs班是否具有(行为上的理由进行平衡分析还是没有)仍然基本上难以实现。在本文中,我们研究了一种为MGs(Mcs)反复播放的虚拟游戏和多层(NNEEE)游戏的组合组合组合模式,这是学习和游戏中两个相同的研究领域MGMG的更新研究领域研究领域,而这种游戏的单个游戏和CMGMG结果与我们的单一游戏的相互利益控制结果。