Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass. However, online learning from trial-and-error for real-world robots is logistically challenging, and methods that instead can utilize existing datasets of robotic navigation data could be significantly more scalable and enable broader generalization. In this paper, we present ReViND, the first offline RL system for robotic navigation that can leverage previously collected data to optimize user-specified reward functions in the real-world. We evaluate our system for off-road navigation without any additional data collection or fine-tuning, and show that it can navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.
翻译:强化学习可以让机器人在优化用户指定奖赏功能的同时,向远方目标行进,同时优化用户指定奖赏功能,包括偏好沿车道行走、停留在铺面路径上或避免青草。 但是,从现实世界机器人的试探和操作器中在线学习在后勤上具有挑战性,相反,能够利用现有机器人导航数据集的方法可以大得多的可缩放性,并能够更广泛地概括化。 在本文中,我们介绍可利用先前收集的数据优化真实世界用户指定奖赏功能的第一个离线RL系统ReVIND。我们评估了我们的越野导航系统,而没有额外的数据收集或微调,并显示它只能利用离线培训才能走向遥远的目标,并展示基于用户指定奖赏功能而性质不同的行为。