We present a novel approach for efficient and reliable goal-directed long-horizon navigation for a multi-robot team in a structured, unknown environment by predicting statistics of unknown space. Building on recent work in learning-augmented model based planning under uncertainty, we introduce a high-level state and action abstraction that lets us approximate the challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of regions more likely to reach the unseen goal. We demonstrate improvement in cost against other multi-robot strategies; in simulated office-like environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3 robot) average cost versus standard non-learned optimistic planning and a learning-informed baseline.
翻译:我们提出了一种新颖的方法,用于在结构化、未知环境中为多机器人团队提供高效和可靠的目标定向长时间导航,该方法预测了未知空间的统计数据。基于最近在不确定性下学习增强的基于模型的规划工作,我们引入了一种高级状态和动作抽象,使我们能够将具有挑战性的 Dec-POMDP 近似为可处理的随机 MDP。我们的多机器人学习分段规划器 (MR-LSP) 引导代理走向更有可能达到未见目标的区域的协调探索。我们证明了在成本方面的改进比其他多机器人策略更有效; 在模拟的办公环境中,我们展示了我们的方法相对于标准的非学习乐观规划和一个学习相关基线,可以使平均成本节约 13.29% (2 机器人) 和 4.6% (3 机器人)。