This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm \textit{Mean Field Proximal Policy Optimization (MF-PPO)}, and we empirically show the effectiveness of our method in the OpenSpiel framework.
翻译:本文研究了一种非合作的多智能体强化学习(MARL),其中多个智能体在同一环境中相互作用,其目标是最大化个人回报。由于众多智能体引入的非稳定性,当智能体数量扩大时,会出现挑战。为了解决这个问题,均场博弈(MFG)依赖于对称性和均匀性假设来近似具有非常大人口的博弈。最近,深层强化学习已被用于扩展具有更多状态的MFG。目前的方法依赖于平滑技术,如平均q值或均场分布更新。本文提出了一种不同的方法来稳定学习,该方法基于均场策略上的接近更新。我们将算法命名为\textit{均场近端政策优化(MF-PPO)},并在OpenSpiel框架中实证展示了我们方法的有效性。