Autonomous exploration has many important applications. However, classic information gain-based or frontier-based exploration only relies on the robot current state to determine the immediate exploration goal, which lacks the capability of predicting the value of future states and thus leads to inefficient exploration decisions. This paper presents a method to learn how "good" states are, measured by the state value function, to provide a guidance for robot exploration in real-world challenging environments. We formulate our work as a off-policy evaluation (OPE) problem for robot exploration (OPERE). It consists of offline Monte-Carlo training on real-world data and performs Temporal Difference (TD) online adaptation to optimize the trained value estimator. We also design an intrinsic reward function based on sensor information coverage to enable the robot to gain more information with sparse extrinsic rewards. Results demonstrate that our method enables the robot to predict the value of future states so as to better guide robot exploration. The proposed algorithm achieves better prediction performance compared with other state-of-the-art OPE methods. To the best of our knowledge, this work for the first time demonstrates value function prediction on real-world dataset for robot exploration in challenging subterranean and urban environments. More details and demo videos can be found at https://jeffreyyh.github.io/opere/.
翻译:自主勘探有许多重要的应用。然而,经典信息获取或前沿勘探只是依靠机器人当前状态来确定眼前的勘探目标,而机器人当前状态又缺乏预测未来国家价值的能力,从而导致低效率的勘探决定。本文介绍了一种方法来了解如何用国家价值函数衡量“好”状态,以便为在现实世界具有挑战性的环境中进行机器人探索提供指导。我们把工作设计成机器人探索的非政策性(OPE)问题。它包括蒙特-卡罗关于真实世界数据的非在线培训,并进行“时间差异”(TD)在线调整,以优化训练有素的价值估计器。我们还设计了一种基于传感器信息覆盖的内在奖励功能,以使机器人能够以稀少的外部回报获得更多信息。结果显示,我们的方法使机器人能够预测未来状态的价值,从而更好地指导机器人探索。提议的算法比其它状态-艺术OPE方法(OPERE方法)更能实现更好的预测业绩。为了我们的最佳知识,这项工作首次展示了对现实世界数据/Morethgieimeal的预测。在More-worth Verob/Attrobreserecal 中可以找到具有挑战性的亚。