We study how an autonomous agent learns to perform a task from demonstrations in a different domain, such as a different environment or different agent. Such cross-domain imitation learning is required to, for example, train an artificial agent from demonstrations of a human expert. We propose a scalable framework that enables cross-domain imitation learning without access to additional demonstrations or further domain knowledge. We jointly train the learner agent's policy and learn a mapping between the learner and expert domains with adversarial training. We effect this by using a mutual information criterion to find an embedding of the expert's state space that contains task-relevant information and is invariant to domain specifics. This step significantly simplifies estimating the mapping between the learner and expert domains and hence facilitates end-to-end learning. We demonstrate successful transfer of policies between considerably different domains, without extra supervision such as additional demonstrations, and in situations where other methods fail.
翻译:我们研究自主代理商如何从不同领域,例如不同的环境或不同的代理商的演示中学会执行任务。例如,这种跨域模仿学习需要从人类专家的演示中训练人工代理商。我们建议一个可扩展的框架,使跨域模仿学习无需更多演示或进一步领域知识即可进行。我们共同培训学习代理商的政策,并学习学习在学习者和专家领域之间进行对抗性培训的地图。我们通过使用相互信息标准来完成这项工作,以找到专家国家空间的嵌入,该空间包含与任务有关的信息,并且无法与领域具体有关。这一步骤大大简化了对学习者和专家领域之间绘图的估算,从而便利了端到端学习。我们展示了政策在不同领域的成功转移,没有额外的监督,例如额外的演示,以及在其他方法失败的情况下。