Mean field games (MFG) facilitate the application of reinforcement learning (RL) in large-scale multi-agent systems, through reducing interplays among agents to those between an individual agent and the average effect from the population. However, RL agents are notoriously prone to unexpected behaviours due to the reward mis-specification. Although inverse RL (IRL) holds promise for automatically acquiring suitable rewards from demonstrations, its extension to MFG is challenging due to the complicated notion of mean-field-type equilibria and the coupling between agent-level and population-level dynamics. To this end, we propose a novel IRL framework for MFG, called Mean Field IRL (MFIRL), where we build upon a new equilibrium concept and the maximum entropy IRL framework. Crucially, MFIRL is brought forward as the first IRL method that can recover the agent-level (ground-truth) reward functions for MFG. Experiments show the superior performance of MFIRL on sample efficiency, reward recovery and robustness against varying environment dynamics, compared to the state-of-the-art method.
翻译:普通场游戏(MFG)有助于在大型多试剂系统中应用强化学习(RL),办法是减少代理人之间的相互作用,减少个人代理人之间的相互作用和人口的平均影响。然而,由于奖励的偏差,RL代理商极易出现意想不到的行为。虽然RL(IRL)有希望自动从示范活动中获得适当的奖赏,但由于平均场类平衡的概念复杂,以及代理人级别和人口级别动态之间的结合,将其推广到MFRG具有挑战性。为此,我们提议为MFG(称为Meal Field IRL(MFIRL))建立一个新型IRL框架,其中我们以新的平衡概念和最大加密的IRL框架为基础。CRIL(IRL)作为第一个IRL(IRL)方法,可以恢复MFG的代理级别(地面)奖赏功能。实验显示MFIRL(FIRL)在样本效率、奖励恢复和抵御不同环境动态方面的优异异性业绩,与州-艺术方法相比,它被提出来。