In this paper, we consider the problem of path finding for a set of homogeneous and autonomous agents navigating a previously unknown stochastic environment. In our problem setting, each agent attempts to maximize a given utility function while respecting safety properties. Our solution is based on ideas from evolutionary game theory, namely replicating policies that perform well and diminishing ones that do not. We do a comprehensive comparison with related multiagent planning methods, and show that our technique beats state of the art RL algorithms in minimizing path length by nearly 30% in large spaces. We show that our algorithm is computationally faster than deep RL methods by at least an order of magnitude. We also show that it scales better with an increase in the number of agents as compared to other methods, path planning methods in particular. Lastly, we empirically prove that the policies that we learn are evolutionarily stable and thus impervious to invasion by any other policy.
翻译:在本文中,我们考虑了寻找一组在以往未知的随机环境中航行的同质和自主的代理商的路径问题。 在我们的问题环境中, 每个代理商都试图在尊重安全特性的同时最大限度地增加一个特定的实用功能。 我们的解决方案是基于进化游戏理论的理念, 即复制运作良好的政策, 并减少不起作用的政策。 我们与相关的多试剂规划方法进行全面比较, 并表明我们的技术在将大空间的路径长度减少近30%方面胜过最先进的RL算法。 我们显示我们的算法比深RL方法的计算速度要快得多, 至少要一个数量级。 我们还表明它比其他方法, 特别是路径规划方法, 还要用更多的代理商数量来衡量得更好。 最后, 我们从经验上证明我们学到的政策是进化稳定的, 因而不会受到任何其他政策的入侵。