PLGRIIM: 用于在未知环境中大规模探索的等级价值学习 (PLGRIM: Hierarchical Value Learning for Large-scale Exploration in Unknown Environments)

In order for a robot to explore an unknown environment autonomously, it must account for uncertainty in sensor measurements, hazard assessment, localization, and motion execution. Making decisions for maximal reward in a stochastic setting requires learning values and constructing policies over a belief space, i.e., probability distribution of the robot-world state. Value learning over belief spaces suffer from computational challenges in high-dimensional spaces, such as large spatial environments and long temporal horizons for exploration. At the same time, it should be adaptive and resilient to disturbances at run time in order to ensure the robot's safety, as required in many real-world applications. This work proposes a scalable value learning framework, PLGRIM (Probabilistic Local and Global Reasoning on Information roadMaps), that bridges the gap between (i) local, risk-aware resiliency and (ii) global, reward-seeking mission objectives. By leveraging hierarchical belief space planners with information-rich graph structures, PLGRIM can address large-scale exploration problems while providing locally near-optimal coverage plans. PLGRIM is a step toward enabling belief space planners on physical robots operating in unknown and complex environments. We validate our proposed framework with a high-fidelity dynamic simulation in diverse environments and with physical hardware, Boston Dynamics' Spot robot, in a lava tube.

翻译：为了让机器人自主地探索未知环境,机器人必须了解感应测量、危害评估、本地化和运动执行方面的不确定性。在随机环境中做出最大奖赏决定需要学习价值观和制定信仰空间的政策,即机器人-世界状态的概率分布。信仰空间的价值学习在高空间的计算挑战中,如大型空间环境和长期探索的时空。与此同时,它应该适应和适应运行时的干扰,以确保机器人的安全,正如许多现实世界应用所要求的那样。这项工作提出了一个可扩展的价值学习框架,即PLGRIM(当地和全球对信息路面的概率解释),以弥合(一) 当地、风险适应性和(二) 全球、追求报酬的任务目标之间的鸿沟。通过利用信息丰富的图表结构的等级空间规划人员,PLGRIM可以解决大规模探索问题,同时提供当地近最佳的覆盖计划。PLGRIM是一个步骤,使空间空间规划者能够在未知的、高动态、高动态、高动态、高动态的轨道环境中运作。