Learning from demonstration (LfD) is a widely researched paradigm for teaching robots to perform novel tasks. LfD works particularly well with program synthesis since the resulting programmatic policy is data efficient, interpretable, and amenable to formal verification. However, existing synthesis approaches to LfD rely on precise and labeled demonstrations and are incapable of reasoning about the uncertainty inherent in human decision-making. In this paper, we propose PLUNDER, a new LfD approach that integrates a probabilistic program synthesizer in an expectation-maximization (EM) loop to overcome these limitations. PLUNDER only requires unlabeled low-level demonstrations of the intended task (e.g., remote-controlled motion trajectories), which liberates end-users from providing explicit labels and facilitates a more intuitive LfD experience. PLUNDER also generates a probabilistic policy that captures actuation errors and the uncertainties inherent in human decision making. Our experiments compare PLUNDER with state-of the-art LfD techniques and demonstrate its advantages across different robotic tasks.
翻译:从演示中学习(LfD)是教导机器人执行新任务的广泛研究范式。LfD在方案综合方面特别出色,因为由此产生的方案政策是数据效率高、可解释和易于正式核查的,但是,现有的LfD综合方法依靠精确和贴标签的演示,无法解释人类决策固有的不确定性。在本文中,我们提议了PLUNDER,这是一个新的LfD方法,将概率性程序合成器纳入预期-最大化循环中,以克服这些限制。PLUNDER只需要不贴标签的低层次演示预定任务(例如遥控运动轨迹),即使终端用户不再提供明确的标签,并便利更直观的LfD经验。PLUNDER还产生了一种概率性政策,即捕捉动作错误和人类决策固有的不确定性。我们的实验将PLUNDER与最新LfD技术进行比较,并展示其在不同机器人任务中的优势。</s>