In order for an autonomous robot to efficiently explore an unknown environment, it must account for uncertainty in sensor measurements, hazard assessment, localization, and motion execution. Making decisions for maximal reward in a stochastic setting requires value learning and policy construction over a belief space, i.e., probability distribution over all possible robot-world states. However, belief space planning in a large spatial environment over long temporal horizons suffers from severe computational challenges. Moreover, constructed policies must safely adapt to unexpected changes in the belief at runtime. This work proposes a scalable value learning framework, PLGRIM (Probabilistic Local and Global Reasoning on Information roadMaps), that bridges the gap between (i) local, risk-aware resiliency and (ii) global, reward-seeking mission objectives. Leveraging hierarchical belief space planners with information-rich graph structures, PLGRIM addresses large-scale exploration problems while providing locally near-optimal coverage plans. We validate our proposed framework with high-fidelity dynamic simulations in diverse environments and on physical robots in Martian-analog lava tubes.
翻译:为了让自主机器人有效探索未知环境,它必须说明传感器测量、危害评估、本地化和运动执行方面的不确定性。 在随机环境中做出最大奖赏的决定,需要在信仰空间上进行价值学习和政策构建,即所有可能的机器人-世界国家的概率分布。然而,在长期空间的大型空间环境中,长期空间空间规划面临严重的计算挑战。此外,设计的政策必须安全地适应运行时的信仰意外变化。这项工作提出了一个可扩展价值学习框架,即PLGRIM(在信息路面上的地方和全球推移),以弥合(一) 当地、风险抗敏度和(二) 全球、追求奖励的任务目标之间的差距。利用信息丰富的图形结构的等级空间规划者,PLGRIM处理大规模探索问题,同时提供当地近最佳覆盖计划。我们用不同环境中的高不易燃性动态模拟以及火星-安娜熔岩管中的物理机器人来验证我们提议的框架。