In many real-world imitation learning tasks, the demonstrator and the learner have to act under different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work. Previous related works need to assume the coexistence of two observation spaces in the demonstrations or that all along the learning process. While in reality, the expert usually provides the demonstration with their observations only, and the observation coexistence will be limited due to the high cost. So in this work, we model the observation mismatch in the imitation learning problem with the above two challenges as a two-phase learning process, namely Heterogeneously Observable Imitation Learning (HOIL). We analyze the underlying learning issues with these challenges, i.e., the dynamics mismatch and the support mismatch, and further propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting and learning with rejection for querying to solve these issues across the observation spaces. Experimental results show that IWRE can successfully solve the difficult HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.
翻译:在许多真实世界模拟学习任务中, 演示人和学习者必须在不同的但全面的观察空间中行动。 这种情况给现有的模拟学习方法工作造成了重大障碍。 先前的相关工作需要假设在演示或学习过程中两个观察空间共存。 虽然在现实中,专家通常只提供他们的观察, 观察共存将受到限制, 因为成本高昂。 因此在这项工作中, 我们将模仿学习问题的观测问题与上述两个挑战作为两个阶段的学习进程, 即高度可观测的模拟学习(HOIL) 进行模拟。 我们分析了与这些挑战相关的基本学习问题, 即动态不匹配和支持不匹配, 并进一步建议根据重要性加权和学习与拒绝在观察空间进行询问以解决这些问题的技巧, 以批判性思维为根据, 思考重要性和学习和拒绝度的算法进行思考。 实验结果表明, IWRE 成功解决了HIL的困难任务, 包括将基于视觉的演示转变为基于域内随机访问的记忆(RAM) 的艰巨任务。