Learning to solve complex manipulation tasks from visual observations is a dominant challenge for real-world robot learning. Deep reinforcement learning algorithms have recently demonstrated impressive results, although they still require an impractical amount of time-consuming trial-and-error iterations. In this work, we consider the promising alternative paradigm of interactive learning where a human teacher provides feedback to the policy during execution, as opposed to imitation learning where a pre-collected dataset of perfect demonstrations is used. Our proposed CEILing (Corrective and Evaluative Interactive Learning) framework combines both corrective and evaluative feedback from the teacher to train a stochastic policy in an asynchronous manner, and employs a dedicated mechanism to trade off human corrections with the robot's own experience. We present results obtained with our framework in extensive simulation and real-world experiments that demonstrate that CEILing can effectively solve complex robot manipulation tasks directly from raw images in less than one hour of real-world training.
翻译:从视觉观测中学习解决复杂的操作任务是现实世界机器人学习的主要挑战。 深强化学习算法最近显示了令人印象深刻的结果,尽管它们仍然需要大量耗时的试机迭代。 在这项工作中,我们认为交互式学习的有希望的替代模式,即一名教师在执行期间向政策提供反馈,而不是模仿学习,即使用完美演示的预收集数据集。我们提议的校正和评估互动学习框架将教师的纠正和评价反馈结合起来,以不同步的方式培训随机政策,并使用专门机制用机器人本身的经验交换人类的纠正。我们介绍了在广泛的模拟和现实世界实验中与我们的框架取得的结果,这些实验表明,在现实世界培训不到一小时的时间里,CELLing能够直接从原始图像中有效地解决复杂的机器人操作任务。