Recently, mean field control (MFC) has provided a tractable and theoretically founded approach to otherwise difficult cooperative multi-agent control. However, the strict assumption of many independent, homogeneous agents may be too stringent in practice. In this work, we propose a novel discrete-time generalization of Markov decision processes and MFC to both many minor agents and potentially complex major agents -- major-minor mean field control (M3FC). In contrast to deterministic MFC, M3FC allows for stochastic minor agent distributions with strong correlation between minor agents through the major agent state, which can model arbitrary problem details not bound to any agent. Theoretically, we give rigorous approximation properties with novel proofs for both M3FC and existing MFC models in the finite multi-agent problem, together with a dynamic programming principle for solving such problems. In the infinite-horizon discounted case, existence of an optimal stationary policy follows. Algorithmically, we propose the major-minor mean field proximal policy optimization algorithm (M3FPPO) as a novel multi-agent reinforcement learning algorithm and demonstrate its success in illustrative M3FC-type problems.
翻译:最近,均值场控制(MFC)提供了一种可轨迹和理论上有基础的方法来解决原来困难的协作多智能体控制问题。然而,许多独立、均匀的代理人的严格假设在实践中可能过于严格。在这项工作中,我们提出了一个新型的离散时间马尔可夫决策过程和MFC的概括,以及适用于许多小型代理人和潜在复杂主体——主次均值场控制(M3FC)。与确定性MFC相比,M3FC允许具有强相关性的小代理人分布通过主体状态进行随机建模,这可以对任意不被任何代理绑定的问题的细节进行建模。从理论上讲,我们为有限多智能体问题的M3FC和现有MFC模型提供了严格的逼近性质和新的证明,以及解决此类问题的动态规划原则。在无限时间折扣情况下,存在最优稳态策略。从算法上讲,我们提出了主次均值场近端策略优化算法(M3FPPO)作为一种新型的多智能体强化学习算法,并展示了其在说明性的M3FC类型问题中的成功。