Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multi-agent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multi-agent coordination with and without state dependence? We present a novel definition of Markov Potential Games (MPG) that generalizes prior attempts at capturing complex stateful multi-agent coordination. Counter-intuitively, insights from normal-form potential games do not carry over as MPGs can consist of settings where state-games can be zero-sum games. In the opposite direction, Markov games where every state-game is a potential game are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main technical result, we prove fast convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings.
翻译:潜在游戏可以说是常见形式游戏中最重要的和广泛研究的类别之一。 它们定义了多试剂协调的原型设置, 因为所有代理工具都通过共同的潜在功能完全对齐。 这个直观框架能否在马尔科夫运动会的设置中移植? 多试剂协调与国家依赖性之间有什么相似和不同之处? 我们对马科夫潜在运动会(MPG)提出了一个新颖的定义, 概括了先前试图捕捉复杂状态多剂协调的复杂尝试。 反直觉地说, 普通形式潜在游戏的洞察不会传过来, 因为 MPG可以包含状态游戏可以是零和游戏的设置。 相反, 每个州游戏都具有潜在游戏的Markov 游戏不一定是MPG 。 然而, MPG 展示了标准的可取性属性, 比如存在确定性纳什政策 。 在我们的主要技术结果中, 我们证明独立政策梯度与纳什政策的快速趋同, 其方法是将最近为单一代理MDP公司开发的梯度定位属性参数调整为多剂学习环境。