We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population behavior. Combining Deep Reinforcement Learning (RL) and Normalizing Flows (NF), we obtain a tractable solution requiring only very weak assumptions. Our algorithm finds a Nash Equilibrium and the agents adapt their velocity to match the neighboring flock's average one. We use Fictitious Play and alternate: (1) computing an approximate best response with Deep RL, and (2) estimating the next population distribution with NF. We show numerically that our algorithm learn multi-group or high-dimensional flocking with obstacles.
翻译:我们提出了一个方法,让大量代理商学会如何放羊,这是一种在大量动物群中观察到的自然行为。这个问题引起了很多兴趣,但需要许多结构性假设,而且只能在小的方面进行。我们把这个问题说成是“平庸野外游戏 ” ( MFG ), 每个人根据人口行为选择加速率。我们结合深强化学习和正常化流程(NF ), 我们得到了一个只要求非常微弱的假设的可移植解决方案。我们的算法发现一个Nash 平衡器, 代理商调整了速度, 以与邻里群的平均速度相匹配。我们使用费奇游戏和替代方法:(1) 计算与深RL 的大致最佳反应,(2) 估计下一个与NF 的人口分布。我们用数字显示我们的算法学习多组或高维群群群与障碍的组合。