从机器人操纵的离线人类演示中学到什么意义? (What Matters in Learning from Offline Human Demonstrations for Robot Manipulation)

Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. In this paper, we conduct an extensive study of six offline learning algorithms for robot manipulation on five simulated and three real-world multi-stage manipulation tasks of varying complexity, and with datasets of varying quality. Our study analyzes the most critical challenges when learning from offline human data for manipulation. Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation. We also highlight opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods, and the ability to easily scale to natural, real-world manipulation scenarios where only raw sensory signals are available. We have open-sourced our datasets and all algorithm implementations to facilitate future research and fair comparisons in learning from human demonstration data. Codebase, datasets, trained models, and more available at https://arise-initiative.github.io/robomimic-web/

翻译：虽然在模仿学习和分批(脱线)强化学习方面最近有所进展,但缺乏开放源码的人类数据集和可复制的学习方法使得评估实地状况变得困难。在本文中,我们广泛研究了在五种模拟和三种现实世界多阶段操作任务上进行机器人操纵的六种离线学习算法,其复杂程度不一,并有不同质量的数据集。我们的研究分析了从离线的人类数据中学习用于操作的最关键挑战。根据这项研究,我们得出了一系列经验教训,包括对不同算法设计选择的敏感性、对演示质量的依赖以及基于不同培训和评估目标而停止标准的变化。我们还强调了从人类数据集中学习精准的多阶段操作政策的机会,例如学习超越当前强化学习方法范围的挑战性、多阶段操作性任务的能力,以及从仅具备原始感官信号的自然、真实世界操纵情景中轻松地推广到自然、真实世界操纵情景的能力。我们通过开放源码式的网络比较,我们从未来数据库中学习了更多的数据。我们从可获取的、经过培训的、经过公平分析的数据模型,我们可以学习的、从未来数据库中学习的、可以学习的数据。