Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.
翻译:作为解决挑战性机器人任务的一种方法,我们建议采用传统反馈控制器的控制行动,以尽量扩大奖赏信号。我们将剩余配方扩展至利用演示从视觉投入和微薄的奖赏中学习。从图像、自觉投入和稀薄的任务完成奖励中学习可以放松获取完整状态特征的要求,如物体和目标位置等。此外,用从演示中学习的政策取代基地控制器可以消除对手动控制器的依赖,而倾向于非专家可以提供的示范数据集。我们对6-DoF UR5和28-DOF的模拟操纵任务进行的实验评价表明,示威残留的RL能够比行为克隆或RL微调更灵活地概括到看不见的环境环境,并且能够从零开始解决不受RL接触的高维度、稀薄的任务。