Learning from demonstration (LfD) provides a convenient means to equip robots with dexterous skills when demonstration can be obtained in robot intrinsic coordinates. However, the problem of compounding errors in long and complex skills reduces its wide deployment. Since most such complex skills are composed of smaller movements that are combined, considering the target skill as a sequence of compact motor primitives seems reasonable. Here the problem that needs to be tackled is to ensure that a motor primitive ends in a state that allows the successful execution of the subsequent primitive. In this study, we focus on this problem by proposing to learn an explicit correction policy when the expected transition state between primitives is not achieved. The correction policy is itself learned via behavior cloning by the use of a state-of-the-art movement primitive learning architecture, Conditional Neural Motor Primitives (CNMPs). The learned correction policy is then able to produce diverse movement trajectories in a context dependent way. The advantage of the proposed system over learning the complete task as a single action is shown with a table-top setup in simulation, where an object has to be pushed through a corridor in two steps. Then, the applicability of the proposed method to bi-manual knotting in the real world is shown by equipping an upper-body humanoid robot with the skill of making knots over a bar in 3D space. The experiments show that the robot can perform successful knotting even when the faced correction cases are not part of the human demonstration set.
翻译:从演示中学习( LfD) 提供了一种方便的手段,让机器人在能够以机器人内在坐标获得演示时,能够掌握超模技能。然而,长期和复杂技能的复合错误问题本身会减少其广泛的部署。由于大多数这类复杂技能是由较小运动组成的,而考虑到目标技能是紧凑发动机原始工艺的序列,因此似乎是合理的。这里需要解决的问题是确保运动原始目的在能够成功实施随后原始技术的情况下能够成功实施。在本研究中,我们把重点放在这一问题上,提议在未实现预期的原始人之间的转型状态时学习明确的纠正政策。纠正政策本身是通过使用最先进的原始学习结构( Contaminal Neor Motor Primitives)来通过行为克隆来学习的。此后,学习的纠正政策能够产生不同的运动轨迹。拟议系统在学习完整任务时的优势是模拟中的桌面-台式设置,在此情况下,一个对象必须先通过一个实验室走廊进行行为克隆,然后使用最先进的原始学习结构,然后用最高级的机器人操作方式来展示一个真实的游戏。