This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
翻译:本研究探讨了多智能体强化学习领域的非合作式多智能体强化学习,即多个智能体在相同的环境中进行交互,旨在最大化个体回报。由于智能体数量的增加导致的非稳态性使得在规模上扩展多智能体强化学习面临着挑战。为了解决这个问题, 均场博弈(Mean Field Games)依赖于对称性和均匀性假设来近似极具庞大智能体数量的博弈。近年来,深度强化学习已被用于扩展有大量状态的均场博弈。目前的方法依赖于平均q值或各个方面分布上更新的平滑技巧。本文介绍了一种不同的方法,通过均场策略的近端更新来稳定学习。我们称此算法为均场近端策略优化(MF-PPO),并在OpenSpiel框架中实验显示了该方法的有效性。