Recent progress in end-to-end Imitation Learning approaches has shown promising results and generalization capabilities on mobile manipulation tasks. Such models are seeing increasing deployment in real-world settings, where scaling up requires robots to be able to operate with high autonomy, i.e. requiring as little human supervision as possible. In order to avoid the need for one-on-one human supervision, robots need to be able to detect and prevent policy failures ahead of time, and ask for help, allowing a remote operator to supervise multiple robots and help when needed. However, the black-box nature of end-to-end Imitation Learning models such as Behavioral Cloning, as well as the lack of an explicit state-value representation, make it difficult to predict failures. To this end, we introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy that can be used to predict failures. We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening, showing that we can identify failure scenarios with with 86% precision and 81% recall, evaluated on over 2000 real world runs, improving upon the baseline of simple failure classification by 10 percentage-points.
翻译:近期在端到端的模拟学习方法方面的进展显示了移动操纵任务方面的有希望的结果和普及能力。这些模型正在现实世界环境中看到越来越多的部署,在现实世界环境中,扩大规模要求机器人能够高度自主地运作,即尽可能少需要人的监督。为了避免需要一对一的人类监督,机器人需要能够提前发现和预防政策失误,并请求帮助,允许远程操作员监督多个机器人并在必要时提供帮助。然而,终端到端的模拟操作模型,如行为克隆等黑箱性质,以及缺乏明确的状态价值代表,使得难以预测失败。为此,我们引入行为性克隆价值调整(BCVA),这是学习基于并经过联合培训的州值功能的一种方法,可以用来预测失败。我们通过将BCVA应用到具有挑战性的移动操纵任务,如Betchold开门开放,以及缺乏明确的状态价值代表,因此难以预测失败。为此,我们引入了行为性克隆价值调整(BBCVA),在2000年的精确度上以86 %的精确度上,通过对81%的精确度进行精确度评估。