Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by execution errors and state uncertainty. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm called Meta-Reasoning for Skill Learning (MetaReSkill) that monitors the progress of all recovery policies during training and allocates training resources to recoveries that are likely to improve the task performance the most. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71% to 92.4% in simulation and from 75% to 90% on a real robot.
翻译:在现实世界条件下,由于执行错误和状态不确定性导致的大规模失败的可能性,在现实世界条件下运作的操作具有挑战性。在相对良性的环境中,这种失败可以通过重试或执行少数手工设计的恢复战略之一来克服。相反,接触丰富的连续操作任务,如打开门和装配家具,不适于详尽无遗的手工工程。为了解决这一问题,我们提出了一个以抽样效率高的方式强化操纵战略的总体方法。我们的方法通过在模拟中探索和学习处理这些失败的额外恢复技能,首先发现当前战略的失败模式,从而逐步提高稳健性。与公开操作相比,我们提出的在线算法叫做技能学习Meta-Reasoning(Meta-Reasoning for Skill Learning)(MetareSk),用来监测培训期间所有恢复政策的进展,并将培训资源分配给恢复工作,而恢复工作则最有可能改进任务绩效。我们采用的方法在模拟和微调中学习门开操作的恢复技能,并评价它们的真正机器人。与公开操作相比,我们进行的实验显示,从真正的恢复成功率从75%提高到91%。</s>