Learning from demonstration (LfD) techniques seek to enable novice users to teach robots novel tasks in the real world. However, prior work has shown that robot-centric LfD approaches, such as Dataset Aggregation (DAgger), do not perform well with human teachers. DAgger requires a human demonstrator to provide corrective feedback to the learner either in real-time, which can result in degraded performance due to suboptimal human labels, or in a post hoc manner which is time intensive and often not feasible. To address this problem, we present Mutual Information-driven Meta-learning from Demonstration (MIND MELD), which meta-learns a mapping from poor quality human labels to predicted ground truth labels, thereby improving upon the performance of prior LfD approaches for DAgger-based training. The key to our approach for improving upon suboptimal feedback is mutual information maximization via variational inference. Our approach learns a meaningful, personalized embedding via variational inference which informs the mapping from human provided labels to predicted ground truth labels. We demonstrate our framework in a synthetic domain and in a human-subjects experiment, illustrating that our approach improves upon the corrective labels provided by a human demonstrator by 63%.
翻译:从演示(LfD) 中学习的技巧力求让新用户能够教授机器人在现实世界中的新任务。 但是,先前的工作表明,以机器人为中心的LfD方法,如数据集聚合(Dagger),与人类教师的表现不尽人意。 Dagger 要求人类演示器实时向学习者提供纠正反馈,这可能导致由于低于最优的人类标签导致性能退化,或以时间密集且往往不可行的事后临时方式进行。为了解决这一问题,我们介绍了从演示(MIND MELD)中相互信息驱动的元数据学习。演示(MIND MELD)从质量差的人类标签到预测的地面真相标签等元数据,从而改进了先前的Dagger培训LfD方法的性能。我们改进亚优度反馈的方法的关键是通过变异推法使相互信息最大化。我们的方法通过变异推法学习有意义、个人化的嵌入,通过人类提供的标签到预测地面的标签。我们在合成领域展示了我们的框架,通过人类实验改进了人类的标签,通过人类实验改进了人类的模型来改进了我们的框架。