This paper studies the finite-time horizon Markov games where the agents' dynamics are decoupled but the rewards can possibly be coupled across agents. The policy class is restricted to local policies where agents make decisions using their local state. We first introduce the notion of smooth Markov games which extends the smoothness argument for normal form games to our setting, and leverage the smoothness property to bound the price of anarchy of the Markov game. For a specific type of Markov game called the Markov potential game, we also develop a distributed learning algorithm, multi-agent soft policy iteration (MA-SPI), which provably converges to a Nash equilibrium. Sample complexity of the algorithm is also provided. Lastly, our results are validated using a dynamic covering game.
翻译:本文研究有限时间horizon马尔科夫博弈,其中代理的动态是解耦的,但奖励可以在代理之间耦合。策略类别限制为局部政策,代理人使用其本地状态做出决策。我们首先引入了顺畅马尔科夫博弈的概念,将顺畅性证明应用到我们的环境中,从而限制马尔科夫博弈的劣质解的代价。针对一种特定类型的马尔科夫博弈,称为马尔科夫潜力博弈,我们还开发了一种分布式学习算法,即多代理软策略迭代算法(MA-SPI),该算法可证明收敛到纳什均衡点。还提供了算法的样本复杂度。最后,我们的结果还通过动态遮盖博弈得到了验证。