Learning to solve complex manipulation tasks from visual observations is a dominant challenge for real-world robot learning. Although deep reinforcement learning algorithms have recently demonstrated impressive results in this context, they still require an impractical amount of time-consuming trial-and-error iterations. In this work, we consider the promising alternative paradigm of interactive learning in which a human teacher provides feedback to the policy during execution, as opposed to imitation learning where a pre-collected dataset of perfect demonstrations is used. Our proposed CEILing (Corrective and Evaluative Interactive Learning) framework combines both corrective and evaluative feedback from the teacher to train a stochastic policy in an asynchronous manner, and employs a dedicated mechanism to trade off human corrections with the robot's own experience. We present results obtained with our framework in extensive simulation and real-world experiments to demonstrate that CEILing can effectively solve complex robot manipulation tasks directly from raw images in less than one hour of real-world training.
翻译:从视觉观测中学习解决复杂的操作任务是现实世界机器人学习的主要挑战。 尽管深层强化学习算法最近在这方面显示了令人印象深刻的成果,但它们仍然需要大量不切实际的耗时试验和过错迭代。 在这项工作中,我们认为交互式学习的有希望的替代模式,即让一名教师在执行期间向政策提供反馈,而不是模仿学习,在使用完美演示的预收集数据集时进行模拟学习。我们提议的CEIL(校对和评估互动学习)框架将教师的纠正和评价反馈结合起来,以不同步的方式培训一项随机政策,并使用专门机制用机器人本身的经验交换人类的纠正。我们介绍了在广泛的模拟和现实世界实验中利用我们的框架取得的结果,以证明CEILing能够有效地解决在不到一个小时的现实世界培训中直接从原始图像中直接产生的复杂的机器人操纵任务。