学习从演示中学习, 与微弱监督的分解 (Learning from Demonstration with Weakly Supervised Disentanglement)

from arxiv, 18 pages, 16 figures, accepted at the International Conference on Learning Representations (ICLR) 2021, supplementary website at https://sites.google.com/view/weak-label-lfd

Robotic manipulation tasks, such as wiping with a soft sponge, require control from multiple rich sensory modalities. Human-robot interaction, aimed at teaching robots, is difficult in this setting as there is potential for mismatch between human and machine comprehension of the rich data streams. We treat the task of interpretable learning from demonstration as an optimisation problem over a probabilistic generative model. To account for the high-dimensionality of the data, a high-capacity neural network is chosen to represent the model. The latent variables in this model are explicitly aligned with high-level notions and concepts that are manifested in a set of demonstrations. We show that such alignment is best achieved through the use of labels from the end user, in an appropriately restricted vocabulary, in contrast to the conventional approach of the designer picking a prior over the latent variables. Our approach is evaluated in the context of two table-top robot manipulation tasks performed by a PR2 robot -- that of dabbing liquids with a sponge (forcefully pressing a sponge and moving it along a surface) and pouring between different containers. The robot provides visual information, arm joint positions and arm joint efforts. We have made videos of the tasks and data available - see supplementary materials at: https://sites.google.com/view/weak-label-lfd.

翻译：机器人操作任务,例如用软海绵擦拭软海绵,需要多种丰富的感官模式来控制。人类- 机器人互动,旨在教授机器人,在这种环境下是困难的,因为人类和机器对丰富数据流的理解可能不匹配。我们把从演示中进行可解释的学习的任务视为对概率基因模型的优化问题。为说明数据的高度维度,选择了一个高容量的神经网络来代表模型。这个模型的潜伏变量与一组演示中展示的高层次概念和概念明确一致。我们表明,通过使用终端用户的标签,在适当限制的词汇中,与设计者对潜在变量进行选择的传统方法相比,实现这种匹配的最佳方式。我们的方法是在由PL2机器人执行的两种桌面机器人操作任务的背景下进行评估的,即用海绵(强制按下海绵并移动海绵)浸泡液体和在不同容器之间浇水。机器人提供了视觉信息, 联合武器位置和手臂的词汇, 与设计者对潜在变量的常规方法形成对照。我们做了两个桌面机器人操作的视频: http/ http:// labals