Multi-agent path finding (MAPF) has been widely used to solve large-scale real-world problems, e.g. automation warehouse. The learning-based fully decentralized framework has been introduced to simultaneously alleviate real-time problem and pursuit the optimal planning policy. However, existing methods might generate significantly more vertex conflicts (called collision), which lead to low success rate or more makespan. In this paper, we propose a PrIoritized COmmunication learning method (PICO), which incorporates the implicit planning priorities into the communication topology within the decentralized multi-agent reinforcement learning framework. Assembling with the classic coupled planners, the implicit priority learning module can be utilized to form the dynamic communication topology, which also build an effective collision-avoiding mechanism. PICO performs significantly better in large-scale multi-agent path finding tasks in both success rates and collision rates than state-of-the-art learning-based planners.
翻译:多试剂路径发现(MAPF)被广泛用于解决大规模现实世界问题,例如自动化仓库;采用以学习为基础的完全分散的框架,以同时缓解实时问题和推行最佳规划政策;然而,现有方法可能会产生更多的顶部冲突(所谓的碰撞),导致低成功率或更差的碰撞;在本文件中,我们建议采用一种简单化的混合学习方法(PICO),在分散化的多试剂强化学习框架内,将隐含的规划优先事项纳入通信表层;与传统的混合规划者一起,可以使用隐含的优先学习模块形成动态通信表层,这也能建立有效的避免碰撞机制;在大型多试剂方法中,在成功率和碰撞率两方面都比最先进的以学习为基础的规划者要好得多。