Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain imitation that uses the Gromov-Wasserstein distance to align and compare states between the different spaces of the agents. Our theory formally characterizes the scenarios where GWIL preserves optimality, revealing its possibilities and limitations. We demonstrate the effectiveness of GWIL in non-trivial continuous control domains ranging from simple rigid transformation of the expert domain to arbitrary transformation of the state-action space.
翻译:交叉模仿学习研究 如何利用一个代理器的专家示范来训练具有不同化体或形态的仿真剂 。比较专家与仿真剂之间的轨迹和固定分布具有挑战性,因为它们生活在不同的系统上,这些系统甚至没有相同的维度。我们提议Gromov-Wasserstein 模拟学习(GWIL),这是一种交叉仿真方法,它使用格罗莫夫-瓦瑟斯坦距离来调和和比较不同代理器空间之间的状态。我们的理论正式描述了GWIL保存最佳性、暴露其可能性和局限性的情景。我们展示了GWIL在非连续控制领域的效力,从专家领域的简单僵硬转换到任意改变国家行动空间。