通过强化学习适应性选择信息性路径规划战略 (Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning)

In our previous work, we designed a systematic policy to prioritize sampling locations to lead significant accuracy improvement in spatial interpolation by using the prediction uncertainty of Gaussian Process Regression (GPR) as "attraction force" to deployed robots in path planning. Although the integration with Traveling Salesman Problem (TSP) solvers was also shown to produce relatively short travel distance, we here hypothesise several factors that could decrease the overall prediction precision as well because sub-optimal locations may eventually be included in their paths. To address this issue, in this paper, we first explore "local planning" approaches adopting various spatial ranges within which next sampling locations are prioritized to investigate their effects on the prediction performance as well as incurred travel distance. Also, Reinforcement Learning (RL)-based high-level controllers are trained to adaptively produce blended plans from a particular set of local planners to inherit unique strengths from that selection depending on latest prediction states. Our experiments on use cases of temperature monitoring robots demonstrate that the dynamic mixtures of planners can not only generate sophisticated, informative plans that a single planner could not create alone but also ensure significantly reduced travel distances at no cost of prediction reliability without any assist of additional modules for shortest path calculation.

翻译：在先前的工作中,我们设计了一项系统性政策,将抽样地点列为优先事项,以便通过将高山进程回归(GPR)的预测不确定性用作在路径规划中部署机器人的“吸引力”的“吸引力”,从而在空间内间隙中实现显著的准确性改进。虽然与旅行销售员问题(TSP)解答器的整合也表明可以产生相对较短的旅行距离,但我们在此假设若干因素可能会降低总体预测精确度,而且因为次最佳地点最终可能被纳入路径。为了解决这一问题,我们在本文件中首先探索“当地规划”方法,在其中采用各种空间范围的“地方规划”方法,将下一个取样地点列为优先,以调查其对预测性能的影响以及旅行距离。此外,基于加强学习(RL)的高层次控制器受过培训,可以适应性地制作来自特定一组地方规划员的混合计划,以便根据最新的预测状态从选择中继承独特的优势。我们对温度监测机器人的使用案例进行的实验表明,动态规划人员混合物不仅能够产生复杂的、信息性的计划,单项规划员无法单独创造,而且还能确保大大缩短旅行距离,而无需计算任何可靠度的最短路段。