Reinforcement learning is a powerful framework for robots to acquire skills from experience, but often requires a substantial amount of online data collection. As a result, it is difficult to collect sufficiently diverse experiences that are needed for robots to generalize broadly. Videos of humans, on the other hand, are a readily available source of broad and interesting experiences. In this paper, we consider the question: can we perform reinforcement learning directly on experience collected by humans? This problem is particularly difficult, as such videos are not annotated with actions and exhibit substantial visual domain shift relative to the robot's embodiment. To address these challenges, we propose a framework for reinforcement learning with videos (RLV). RLV learns a policy and value function using experience collected by humans in combination with data collected by robots. In our experiments, we find that RLV is able to leverage such videos to learn challenging vision-based skills with less than half as many samples as RL methods that learn from scratch.
翻译:强化学习是机器人从经验中获取技能的强大框架,但往往需要大量的在线数据收集。 因此,很难收集机器人广泛推广所需的足够多样的经验。 另一方面,人类的视频是广泛而有趣的经验的现成来源。 在本文中,我们考虑的问题是:我们能否直接根据人类收集的经验进行强化学习?这一问题特别困难,因为此类视频没有附加行动说明,并且显示与机器人的化身相比,视觉领域发生了巨大的变化。为了应对这些挑战,我们提议了一个用视频加强学习的框架(RLV ) 。 RLV利用人类与机器人收集的数据相结合收集的经验学习政策和价值功能。在我们的实验中,我们发现RLV能够利用这些视频学习具有挑战性的视觉技能,其样本只有从零中学习的RL方法的一半。