In this work, we introduce a new perspective for learning transferable content in multi-task imitation learning. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task. This allows improved training efficiency and better generalization over previously unseen combinations of skills in the same environment, and the same task in unseen environments. We used the proposed approach to train a disentangled agent for two different multi-task IL environments. In both cases we out-performed the SOTA by 30% in task success rate. We also demonstrated this for navigation on a real robot.
翻译:在这项工作中,我们引入了在多任务模拟学习中学习可转让内容的新视角。 人类能够转让技能和知识。 如果我们可以循环工作, 我们可以循环到商店, 我们可以循环到仓库, 并驱动工作。 我们从中汲取灵感, 并假设政策网络的潜在记忆可以分解成两个分区。 这些分区要么包含任务的环境背景知识, 要么包含解决任务所需的一般技能。 这样可以提高培训效率, 更好地推广以前无法见的同一环境中的技能组合, 以及未知环境中的相同任务。 我们用建议的方法为两种不同的多任务 IL 环境训练一个分解的代理。 在这两种情况下,我们在任务成功率上都比SOTA高出了30%。 我们还演示了这个方法, 用于在真正的机器人上导航 。