模仿:学习做什么和不做什么 (Ergodic imitation: Learning from what to do and what not to do)

With growing access to versatile robotics, it is beneficial for end users to be able to teach robots tasks without needing to code a control policy. One possibility is to teach the robot through successful task executions. However, near-optimal demonstrations of a task can be difficult to provide and even successful demonstrations can fail to capture task aspects key to robust skill replication. Here, we propose a learning from demonstration (LfD) approach that enables learning of robust task definitions without the need for near-optimal demonstrations. We present a novel algorithmic framework for learning tasks based on the ergodic metric -- a measure of information content in motion. Moreover, we make use of negative demonstrations -- demonstrations of what not to do -- and show that they can help compensate for imperfect demonstrations, reduce the number of demonstrations needed, and highlight crucial task elements improving robot performance. In a proof-of-concept example of cart-pole inversion, we show that negative demonstrations alone can be sufficient to successfully learn and recreate a skill. Through a human subject study with 24 participants, we show that consistently more information about a task can be captured from combined positive and negative (posneg) demonstrations than from the same amount of just positive demonstrations. Finally, we demonstrate our learning approach on simulated tasks of target reaching and table cleaning with a 7-DoF Franka arm. Our results point towards a future with robust, data-efficient LfD for novice users.

翻译：随着对多用途机器人的日益普及,终端用户在不需要制定控制政策的情况下能够教授机器人任务,这有益于终端用户。一种可能性是通过成功的任务执行来教授机器人。然而,几乎最佳的任务示范可能难以提供,甚至成功的演示可能无法捕捉到强力技能复制的关键任务方面。在这里,我们建议从示范(LfD)方法中学习,从而学习强力的任务定义,而不需要近乎最佳的演示。我们提出了一个新的逻辑框架,用于根据自成标准来学习机器人任务 -- -- 这是一种动态的信息内容的衡量尺度。此外,我们利用负面的演示 -- -- 演示非要做的东西 -- -- 来显示它们能够帮助弥补不完善的演示,减少所需的演示数量,并突出改进机器人性能的关键任务要素。在马车站倒转的证明性实例中,我们表明光靠负面的演示就足以成功地学习和重新创造技能。我们通过与24名参与者进行的一项人类主题研究,我们显示关于一项任务的不断得到更多的信息,从正面和负面的演示(假设式的)综合的D-我们最终从模拟的演示结果中学习一个正面的7项。