Multi-Agent Path Finding (MAPF) finds conflict-free paths for multiple agents from their respective start to goal locations. MAPF is challenging as the joint configuration space grows exponentially with respect to the number of agents. Among MAPF planners, search-based methods, such as CBS and M*, effectively bypass the curse of dimensionality by employing a dynamically-coupled strategy: agents are planned in a fully decoupled manner at first, where potential conflicts between agents are ignored; and then agents either follow their individual plans or are coupled together for planning to resolve the conflicts between them. In general, the number of conflicts to be resolved decides the run time of these planners and most of the existing work focuses on how to efficiently resolve these conflicts. In this work, we take a different view and aim to reduce the number of conflicts (and thus improve the overall search efficiency) by improving each agent's individual plan. By leveraging a Visual Transformer, we develop a learning-based single-agent planner, which plans for a single agent while paying attention to both the structure of the map and other agents with whom conflicts may happen. We then develop a novel multi-agent planner called LM* by integrating this learning-based single-agent planner with M*. Our results show that for both "seen" and "unseen" maps, in comparison with M*, LM* has fewer conflicts to be resolved and thus, runs faster and enjoys higher success rates. We empirically show that MAPF solutions computed by LM* are near-optimal. Our code is available at https://github.com/lakshayvirmani/learning-assisted-mstar .
翻译:多位代理商的道路发现(MAPF) 找到从他们各自一开始到目标地点的多种代理商的无冲突路径。 MAPF 具有挑战性, 因为联合配置空间在代理商数量方面成倍增长。 在MAPF规划者中, CBS 和 M* 等基于搜索的方法通过动态组合战略有效绕过维度的诅咒: 代理商最初是以一种完全分解的方式规划的, 其中代理商之间的潜在冲突被忽视; 然后代理商要么遵循他们各自的计划, 要么同时规划解决他们之间的冲突。 一般来说, 需要解决的冲突数量决定着这些规划者运行的时间, 而大部分现有工作的重点是如何有效解决这些冲突。 在这项工作中, 我们采取不同的观点, 目的是通过改进每个代理商的个人计划来减少冲突的数量( 从而提高总体搜索效率 ) 。 通过利用一个视觉变压器, 我们开发了一个基于学习基础的单一代理商的单一代理商更低的学习计划, 同时关注与可能发生冲突的其他代理商的结构* 。 然后我们开发了一个新的多级计划*, 我们用LrmalM 运行了一个新的多级计划, 这样的M 学习结果, 我们的M 学习了“ 学习的M 。