Multi-agent pathfinding (MAPF) has been widely used to solve large-scale real-world problems, e.g., automation warehouses. The learning-based, fully decentralized framework has been introduced to alleviate real-time problems and simultaneously pursue optimal planning policy. However, existing methods might generate significantly more vertex conflicts (or collisions), which lead to a low success rate or more makespan. In this paper, we propose a PrIoritized COmmunication learning method (PICO), which incorporates the \textit{implicit} planning priorities into the communication topology within the decentralized multi-agent reinforcement learning framework. Assembling with the classic coupled planners, the implicit priority learning module can be utilized to form the dynamic communication topology, which also builds an effective collision-avoiding mechanism. PICO performs significantly better in large-scale MAPF tasks in success rates and collision rates than state-of-the-art learning-based planners.
翻译:多试剂路由调查(MAPF)被广泛用于解决大规模现实问题,如自动化仓库等; 采用以学习为基础的、完全分散的框架,以缓解实时问题,同时推行最佳规划政策; 然而,现有方法可能会产生更多的顶层冲突(或碰撞),导致低成功率或更高差异; 在本文件中,我们建议采用一种简单化的COmunic 学习方法(PICO),在分散化的多试剂强化学习框架内将规划优先事项纳入通信表层; 与传统的混合规划者一起,可以利用隐含的优先学习模块形成动态通信表层,这也能建立有效的避免碰撞的机制; 石化组织在大型MAPF任务的成功率和碰撞率方面比最先进的学习型规划者要好得多。