In many real-world imitation learning tasks, the demonstrator and the learner have to act under totally different observation spaces. This situation brings significant obstacles to existing imitation learning approaches, since most of them learn policies under homogeneous observation spaces. On the other hand, previous studies under different observation spaces have strong assumptions that these two observation spaces coexist during the entire learning process. However, in reality, the observation coexistence will be limited due to the high cost of acquiring expert observations. In this work, we study this challenging problem with limited observation coexistence under heterogeneous observations: Heterogeneously Observable Imitation Learning (HOIL). We identify two underlying issues in HOIL, i.e., the dynamics mismatch and the support mismatch, and further propose the Importance Weighting with REjection (IWRE) algorithm based on importance-weighting and learning with rejection to solve HOIL problems. Experimental results show that IWRE can successfully solve various HOIL tasks, including the challenging tasks of transforming the vision-based demonstrations to random access memory (RAM)-based policies in the Atari domain, even with limited visual observations.
翻译:在许多真实世界的模拟学习任务中,演示人和学习者必须在完全不同的观测空间中采取行动。这种情况给现有的模拟学习方法带来了重大障碍,因为大多数他们学习的是同质观测空间的政策。另一方面,不同观测空间以前的研究有强烈的假设,认为这两个观测空间在整个学习过程中共存。然而,在现实中,由于获取专家观察的高昂费用,观测共存将受到限制。在这项工作中,我们研究了不同观测观测下观测共存有限这一具有挑战性的问题:高度可观察的模拟学习(HOIL)。我们确定了HOIL的两个根本问题,即动态不匹配和支持不匹配,并进一步提出了基于重要性加权加权和学习,以拒绝解决HOIL问题的“重要加权”算法。实验结果表明,IWRE能够成功解决各种HOIL任务,包括将基于愿景的演示转变为阿塔里域的随机访问记忆(RAM)政策,即使视觉观察也有限。