Mean field games (MFG) facilitate the otherwise intractable reinforcement learning (RL) in large-scale multi-agent systems (MAS), through reducing interplays among agents to those between a representative individual agent and the mass of the population. While, RL agents are notoriously prone to unexpected behaviours due to reward mis-specification. This problem is exacerbated by an expanding scale of MAS. Inverse reinforcement learning (IRL) provides a framework to automatically acquire proper reward functions from expert demonstrations. Extending IRL to MFG, however, is challenging due to the complex notion of mean-field-type equilibria and the coupling between agent-level and population-level dynamics. To this end, we propose mean field inverse reinforcement learning (MFIRL), a novel model-free IRL framework for MFG. We derive the algorithm based on a new equilibrium concept that incorporates entropy regularization, and the maximum entropy IRL framework. Experimental results on simulated environments demonstrate that MFIRL is sample efficient and can accurately recover the ground-truth reward functions, compared to the state-of-the-art method.
翻译:大型多试剂系统(MAS)将代理人之间的相互作用减少到具有代表性的个人代理人与人口数量之间的相互作用,从而便利了大规模多试剂系统(MAS)中本可棘手的强化学习(RL)。虽然RL代理商由于奖励错误的区分而臭名昭著地容易出现意外行为。这个问题由于MAS规模的扩大而更加严重。反向强化学习(IRL)为从专家演示中自动获得适当的奖赏功能提供了一个框架。但是,将IRL扩大到MFG具有挑战性,因为平均场类平衡概念复杂,以及代理商一级和人口一级动态之间的结合。为此,我们提议采用中度的反向强化学习(MFIRL),这是MFG一个新型的无模式的IRL框架。我们根据一种新平衡概念来计算算法,其中包括了对导管的正规化,以及最大对导体IRL框架。模拟环境的实验结果表明,MFIRL是样品有效的,能够准确恢复地面的奖励功能,与州级方法相比。