Learning from demonstration (LfD) has succeeded in tasks featuring a long time horizon. However, when the problem complexity also includes human-in-the-loop perturbations, state-of-the-art approaches do not guarantee the successful reproduction of a task. In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration. By utilizing modes (rather than subgoals) as the discrete abstraction and motion policies with both mode invariance and goal reachability properties, we prove our learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula. Consequently, an imitator is robust to both task- and motion-level perturbations and guaranteed to achieve task success. Project page: https://yanweiw.github.io/tli/
翻译:从演示中学习( LfD) 成功地完成了具有长期时间跨度的任务。 但是,当问题的复杂性还包括人与人之间在环形扰动时, 最先进的方法并不能保证任务的成功复制。 在这项工作中, 我们确定这项挑战的根源是学习的连续政策未能满足演示中隐含的离散计划。 通过使用模式( 而不是次级目标), 将模式( 而不是子目标) 用作具有模式差异性和目标可达性特性的离散的抽象和运动政策, 我们证明我们所学的连续政策可以模拟线性时间逻辑( LTL) 公式规定的任何离散计划。 因此, 模拟器对任务和运动层次的扰动性扰动都十分有力, 并保证任务成功。 项目网页: https://yanweiw.github.io/tli/ 。