Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by partial observability. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm Value Upper Confidence Limit (Value-UCL) that selects what failure modes to prioritize and which state to recover to such that the expected performance improves maximally in every training episode. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71\% to 92.4\% in simulation and from 75\% to 90\% on a real robot.
翻译:在现实世界条件下,由于部分可视性引发的多种失败的可能性,在现实条件下操作具有挑战性。在相对良性的环境中,这种失败可以通过重试或执行少数手工设计的恢复战略之一来克服。相反,接触丰富的连续操纵任务,如开门和装配家具,不易完成详尽无遗的手工工程。为了解决这一问题,我们提出了一个以抽样高效的方式强化操纵战略的总体方法。我们的方法通过在模拟中探索和学习更多的回收技能来先发现当前战略的失败模式,从而逐步提高稳健性。为了确保高效学习,我们提议了在线算法增值高度信任限制(Value-UCL),选择了哪些失败模式优先,并声明要恢复到每个培训阶段的预期业绩得到最大程度的改进。我们用我们的方法在模拟和微调的真正机器人上学习回收技能并进行评估。与开放执行相比,我们的实验表明,在模拟和75个机器人中,甚至有限数量的恢复学习了实际任务成功率,从71°_至92°___。