Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient approach for training language-conditioned manipulation policies that allows for rapid transfer across different types of robots. By introducing a novel method, namely Hierarchical Modularity, and adopting supervised attention across multiple sub-modules, we bridge the divide between modular and end-to-end learning and enable the reuse of functional building blocks. In both simulated and real world robot manipulation experiments, we demonstrate that our method outperforms the current state-of-the-art methods and can transfer policies across 4 different robots in a sample-efficient manner. Finally, we show that the functionality of learned sub-modules is maintained beyond the training process and can be used to introspect the robot decision-making process. Code is available at https://github.com/ir-lab/ModAttn.
翻译:语言条件政策允许机器人解释和执行人类指令。 学习这种政策需要在时间和计算资源方面进行大量投资。 但是,由此产生的控制器具有高度的装置特性,不能轻易地转移到具有不同形态、能力、外观或动态的机器人。 在本文中,我们提出一种样本高效的方法,用于培训语言条件的操纵政策,允许在不同类型机器人之间进行快速转让。通过引入一种新颖的方法,即等级式模块,并在多个子模块之间采用监督关注,我们弥合模块和终端至终端学习之间的差距,并允许功能性建筑块的再利用。在模拟和真实世界机器人操纵实验中,我们证明我们的方法优于目前最先进的方法,可以以抽样有效的方式将政策转移到4个不同的机器人。最后,我们表明,学习的子模块的功能在培训过程之外得到保持,并可用于引入机器人决策过程。 代码可在 https://github.com/ir-lab/ModAt查阅 https://github.com/ir-lab/ModAtt查阅。