Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities without the need for a carefully-designed reward or a significant computational effort. However, existing imitation learning approaches require sophisticated data collection infrastructure and struggle to generalize beyond the training distribution. One way to address this limitation is to gather additional data that better represents the full operating conditions. In this work, we investigate characteristics of such additional demonstrations and their impact on performance. Specifically, we study the effects of corrective and randomly-sampled additional demonstrations on learning a policy that guides a five-fingered robot hand through a pick-and-place task. Our results suggest that corrective demonstrations considerably outperform randomly-sampled demonstrations, when the proportion of additional demonstrations sampled from the full task distribution is larger than the number of original demonstrations sampled from a restrictive training distribution. Conversely, when the number of original demonstrations are higher than that of additional demonstrations, we find no significant differences between corrective and randomly-sampled additional demonstrations. These results provide insights into the inherent trade-off between the effort required to collect corrective demonstrations and their relative benefits over randomly-sampled demonstrations. Additionally, we show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations for dexterous manipulation tasks. Our code is available at https://github.com/GT-STAR-Lab/corrective-demos-dexterous-manipulation.
翻译:模拟学习是一种很有希望的方法,可以帮助机器人获得超脱的操纵能力,而不需要精心设计的奖励或重大的计算努力。然而,现有的模拟学习方法需要复杂的数据收集基础设施,并努力在培训分布范围以外加以推广。解决这一限制的一个办法是收集更多更能反映全面操作条件的更多数据。在这项工作中,我们调查这类额外演示的特征及其对绩效的影响。具体地说,我们研究纠正和随机抽样的额外演示对学习一项指导五指机器人手通过挑拣选和定位任务的政策的影响。我们的结果表明,纠正性演示大大超越随机抽样示范的形式,因为从整个任务分布中抽查的额外演示的比例大于从限制性培训分布中抽查的原始演示数量。相反,当这类新增演示的数量高于其他演示的数量时,我们发现纠正性和随机抽样抽样的额外演示之间没有重大差异。这些结果揭示了为收集纠正性演示所需的努力与其相对优于随机选选选的收益之间的内在交易。此外,我们可以用成本标准来展示。我们用这样的低成本的传感器来展示。