This work focuses on the persistent monitoring problem, where a set of targets moving based on an unknown model must be monitored by an autonomous mobile robot with a limited sensing range. To keep each target's position estimate as accurate as possible, the robot needs to adaptively plan its path to (re-)visit all the targets and update its belief from measurements collected along the way. In doing so, the main challenge is to strike a balance between exploitation, i.e., re-visiting previously-located targets, and exploration, i.e., finding new targets or re-acquiring lost ones. Encouraged by recent advances in deep reinforcement learning, we introduce an attention-based neural solution to the persistent monitoring problem, where the agent can learn the inter-dependencies between targets, i.e., their spatial and temporal correlations, conditioned on past measurements. This endows the agent with the ability to determine which target, time, and location to attend to across multiple scales, which we show also helps relax the usual limitations of a finite target set. We experimentally demonstrate that our method outperforms other baselines in terms of number of targets visits and average estimation error in complex environments. Finally, we implement and validate our model in a drone-based simulation experiment to monitor mobile ground targets in a high-fidelity simulator.
翻译:这项工作的重点是长期监测问题,根据未知模型移动的一组目标必须由一个感测范围有限的自主移动机器人来监测。为了使每个目标的位置估计尽可能准确,机器人需要适应性地规划其通向(重新)访问所有目标的道路,并根据沿途收集的测量结果更新其信念。在这样做的过程中,主要挑战是在开发(即重新访问先前确定的目标)和探索(即寻找新目标或重新获取丢失目标)之间取得平衡,即找到新目标或重新获取目标)和探索(即寻找新目标)之间取得平衡。在深入强化学习的最新进展的鼓励下,我们为持续监测问题引入了基于关注的神经解决方案,使该代理人能够根据以往测量结果,了解目标之间的相互依存关系,即它们之间的空间和时间关系。这让该代理人有能力确定跨多个尺度的哪个目标、时间和地点,这也有助于缓解所设定的有限目标的通常局限性。我们实验性地展示了我们的方法在高超标数的模型上超越了其他基线,我们最终在高水平的实地勘测中,在高空测中,我们进行了高空测。</s>