Multimodal demonstrations provide robots with an abundance of information to make sense of the world. However, such abundance may not always lead to good performance when it comes to learning sensorimotor control policies from human demonstrations. Extraneous data modalities can lead to state over-specification, where the state contains modalities that are not only useless for decision-making but also can change data distribution across environments. State over-specification leads to issues such as the learned policy not generalizing outside of the training data distribution. In this work, we propose Masked Imitation Learning (MIL) to address state over-specification by selectively using informative modalities. Specifically, we design a masked policy network with a binary mask to block certain modalities. We develop a bi-level optimization algorithm that learns this mask to accurately filter over-specified modalities. We demonstrate empirically that MIL outperforms baseline algorithms in simulated domains including MuJoCo and a robot arm environment using the Robomimic dataset, and effectively recovers the environment-invariant modalities on a multimodal dataset collected on a real robot. Our project website presents supplemental details and videos of our results at: https://tinyurl.com/masked-il
翻译:多式演示为机器人提供了丰富的信息,使世界变得有意义。然而,这种丰富性在学习人类演示的感官分子控制政策时,不一定总能带来良好的表现。外部数据模式可能导致状态过于具体化,因为国家包含的模式不仅对决策没有用处,而且可以改变环境之间的数据分布。国家过于具体化导致诸如在培训数据发布之外没有普遍化的学习政策等问题。在这项工作中,我们提议使用信息化模式有选择地解决特定化问题。具体地说,我们设计了一个带有二元面具的蒙面政策网络,以阻止某些模式。我们开发了双级优化算法,以学习这一掩面罩以准确过滤超特定模式。我们从经验上证明,MIL在模拟域(包括MuJoco)和机器人臂环境(使用机器人数据集)中超越了基线算法,并有效地恢复了在真实机器人上收集的多式数据集上的环境-变量模式。我们的项目网站提供了我们结果的补充细节和视频: http://urkeptin。