Multi-agent Markov Decision Processes (MMDPs) arise in a variety of applications including target tracking, control of multi-robot swarms, and multiplayer games. A key challenge in MMDPs occurs when the state and action spaces grow exponentially in the number of agents, making computation of an optimal policy computationally intractable for medium- to large-scale problems. One property that has been exploited to mitigate this complexity is transition independence, in which each agent's transition probabilities are independent of the states and actions of other agents. Transition independence enables factorization of the MMDP and computation of local agent policies but does not hold for arbitrary MMDPs. In this paper, we propose an approximate transition dependence property, called $\delta$-transition dependence and develop a metric for quantifying how far an MMDP deviates from transition independence. Our definition of $\delta$-transition dependence recovers transition independence as a special case when $\delta$ is zero. We develop a polynomial time algorithm in the number of agents that achieves a provable bound on the global optimum when the reward functions are monotone increasing and submodular in the agent actions. We evaluate our approach on two case studies, namely, multi-robot control and multi-agent patrolling example.
翻译:多试剂Markov决定程序(MMDPs)出现在各种应用中,包括目标跟踪、多机器人群控和多玩家游戏。当国家和行动空间在代理商数量上成倍增长时,MMDPs面临一个关键的挑战,使得计算最佳政策对中大问题难以计算。为缓解这一复杂性而开发的一个财产是过渡独立,其中每个代理商的过渡概率独立于国家和其他代理商的行动。过渡独立使得MMDP的因子化和计算地方代理商政策,但不能维持任意的MMMDPs。在本文件中,我们提出一个大致过渡依赖性财产,呼吁$\delta$-过渡依赖性,并制定一个衡量MMMDDP脱离过渡独立的程度的衡量标准。我们关于$\delta$-过渡性依赖性依赖性的定义在美元为零时可以恢复过渡性独立的特殊案例。我们开发了一种多边时间算法,即对可实现全球最佳约束的代理商政策的代理商数量进行计算,即在我们的最佳模式的多式监管中,即我们的最佳监管机构增加了单式的多式研究。