Extraterrestrial rovers with a general-purpose robotic arm have many potential applications in lunar and planetary exploration. Introducing autonomy into such systems is desirable for increasing the time that rovers can spend gathering scientific data and collecting samples. This work investigates the applicability of deep reinforcement learning for vision-based robotic grasping of objects on the Moon. A novel simulation environment with procedurally-generated datasets is created to train agents under challenging conditions in unstructured scenes with uneven terrain and harsh illumination. A model-free off-policy actor-critic algorithm is then employed for end-to-end learning of a policy that directly maps compact octree observations to continuous actions in Cartesian space. Experimental evaluation indicates that 3D data representations enable more effective learning of manipulation skills when compared to traditionally used image-based observations. Domain randomization improves the generalization of learned policies to novel scenes with previously unseen objects and different illumination conditions. To this end, we demonstrate zero-shot sim-to-real transfer by evaluating trained agents on a real robot in a Moon-analogue facility.
翻译:在月球和行星探索中,具有通用机器人臂的外星流动者有许多潜在用途。在这种系统中实行自主性是可取的,有助于增加流动者收集科学数据和收集样本的时间。这项工作调查了以视觉为基础的机器人捕捉月球天体的深强化学习的适用性。创建了具有程序生成数据集的新模拟环境,以便在地形不均和严重光化的无结构场景中,在具有挑战性的条件下培训代理人。然后,采用了一种无模型的非政策性行为者-批评算法,用于在终端到终端阶段学习一项政策,直接绘制卡斯特斯空间持续行动的紧凑木观察图。实验性评估表明,3D数据显示能够更有效地学习操纵技能,而与传统使用的图像观测相比。多位随机化改善了所学政策的一般化,以与以往看不见的天体和不同污染条件相新颖的场景为对象。为此,我们通过在月射装置中评价经过培训的机器人上进行零光光向实际转移。