Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.
翻译:现有多试剂强化学习方法通常限于少数代理商。当代理商数目大量增加时,由于代理商相互作用的维度和指数增长的诅咒,学习变得难以操作。在本文件中,我们介绍了 " 平均强化学习 ",其中代理商人口内部的相互作用近似于单一代理商与总体人口或周边代理商平均效应之间的相互作用;两个实体之间的相互作用相互加强:了解个体代理商的最佳政策取决于人口动态,而人口动态则根据个人政策的集体模式变化。我们开发了实用的中平均值的 " Q学习 " 和 " 中平均值的Anderor-Critical " 算法,并分析了纳什平衡解决方案的趋同。关于高斯挤压、伊星模型和战斗游戏的实验证明我们的平均实地方法的学习效果是正确的。此外,我们报告了通过无模型强化学习方法解决伊星模型的第一个结果。