The recent mean field game (MFG) formalism facilitates otherwise intractable computation of approximate Nash equilibria in many-agent settings. In this paper, we consider discrete-time finite MFGs subject to finite-horizon objectives. We show that all discrete-time finite MFGs with non-constant fixed point operators fail to be contractive as typically assumed in existing MFG literature, barring convergence via fixed point iteration. Instead, we incorporate entropy-regularization and Boltzmann policies into the fixed point iteration. As a result, we obtain provable convergence to approximate fixed points where existing methods fail, and reach the original goal of approximate Nash equilibria. All proposed methods are evaluated with respect to their exploitability, on both instructive examples with tractable exact solutions and high-dimensional problems where exact methods become intractable. In high-dimensional scenarios, we apply established deep reinforcement learning methods and empirically combine fictitious play with our approximations.
翻译:最近的平均野外游戏(MFG)形式主义(MFG)为多种试剂环境中近似纳什平衡的计算提供了其他棘手的难题。 在本文中,我们考虑了受有限正数目标限制的离时有限MFG。我们表明,所有非连续固定点操作员的离时有限MFG没有像现有MFG文献中通常假设的那样具有合同性,无法通过固定点迭代实现趋同。相反,我们把加密常规化和博尔茨曼政策纳入了固定点迭代。结果,我们实现了在现有方法失败的情况下接近固定点的趋同,并达到了接近纳什准准点的最初目标。我们对所有拟议方法的可利用性进行了评估,这些方法都具有指导性,有可移植的精确解决方案,有高维度问题,确切的方法变得难以解决。在高维度假设中,我们采用了既定的深度强化学习方法,用经验将虚构的游戏与我们的近似相结合。