Multi-agent reinforcement learning methods have shown remarkable potential in solving complex multi-agent problems but mostly lack theoretical guarantees. Recently, mean field control and mean field games have been established as a tractable solution for large-scale multi-agent problems with many agents. In this work, driven by a motivating scheduling problem, we consider a discrete-time mean field control model with common environment states. We rigorously establish approximate optimality as the number of agents grows in the finite agent case and find that a dynamic programming principle holds, resulting in the existence of an optimal stationary policy. As exact solutions are difficult in general due to the resulting continuous action space of the limiting mean field Markov decision process, we apply established deep reinforcement learning methods to solve the associated mean field control problem. The performance of the learned mean field control policy is compared to typical multi-agent reinforcement learning approaches and is found to converge to the mean field performance for sufficiently many agents, verifying the obtained theoretical results and reaching competitive solutions.
翻译:多剂强化学习方法在解决复杂的多剂问题方面表现出了非凡的潜力,但大多缺乏理论保障。最近,中度实地控制和中度实地游戏被确立为与许多代理商的大规模多剂问题的一个可移植的解决办法。在这项工作中,我们以一个激励性时间安排问题为驱动,考虑一个与共同环境国家分开的时间平均实地控制模式。随着有限代理商案例的增多,我们严格地确立一种近似的最佳性能。我们发现动态的方案编制原则,导致存在一种最佳的固定政策。一般而言,由于限制的中度实地Markov决策过程的持续行动空间,我们采用既定的深度强化学习方法来解决相关的中度实地控制问题。所学的中度实地控制政策的业绩与典型的多剂强化学习方法相比,并发现与足够多的代理商的平均实地业绩一致,核查所获得的理论结果并达成竞争性解决办法。