This paper studies the problem of autonomous exploration under localization uncertainty for a mobile robot with 3D range sensing. We present a framework for self-learning a high-performance exploration policy in a single simulation environment, and transferring it to other environments, which may be physical or virtual. Recent work in transfer learning achieves encouraging performance by domain adaptation and domain randomization to expose an agent to scenarios that fill the inherent gaps in sim2sim and sim2real approaches. However, it is inefficient to train an agent in environments with randomized conditions to learn the important features of its current state. An agent can use domain knowledge provided by human experts to learn efficiently. We propose a novel approach that uses graph neural networks in conjunction with deep reinforcement learning, enabling decision-making over graphs containing relevant exploration information provided by human experts to predict a robot's optimal sensing action in belief space. The policy, which is trained only in a single simulation environment, offers a real-time, scalable, and transferable decision-making strategy, resulting in zero-shot transfer to other simulation environments and even real-world environments.
翻译:本文研究3D射程感测的移动机器人在本地化不确定情况下进行自主探索的问题。我们提出了一个在单一模拟环境中自我学习高性能探索政策的框架,并将它转移到其他环境,可以是物理的,也可以是虚拟的。最近转让学习工作通过域适应和域随机化取得了鼓励性能的成绩,使一个代理人暴露在能够填补Sim2sim和模拟方法内在差距的假想中。然而,在有随机化条件的环境中培训一个代理人学习其当前状态的重要特征是效率低下的。一个代理人可以使用人类专家提供的域知识来有效地学习。我们提出了一种新颖的办法,在深度强化学习的同时利用图形神经网络,使决策能够对载有人类专家提供的有关探索信息的图表作出决策,预测机器人在信仰空间的最佳遥感行动。该政策仅接受单一模拟环境的培训,提供实时、可扩展和可转让的决策战略,导致向其他模拟环境甚至现实世界环境零发转让。