In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's occupancy measures to learner's dynamically changing occupancy measures under the different observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Imitation Learning (HOIL). We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching. Experimental results show that IWRE can successfully solve HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.
翻译:在许多现实世界模拟学习任务中,演示人和学习者必须在不同的但全面的观测空间中采取行动。这种情形对现有模拟学习方法的工作造成了重大障碍,即使这些方法与传统的空间适应技术相结合。主要挑战在于将专家的占用措施连接起来,以便学习者在不同观测空间动态改变占用措施。在这项工作中,我们将上述学习问题建模为异质观察模拟学习(HOIL ) 。我们建议根据重要性加权、以拒绝方式学习和积极质询来解决占用计量匹配的关键挑战,用反射(IWRE)算法衡量重要性。实验结果表明,IWRE能够成功解决HOIL任务,包括将基于愿景的演示转化为阿塔里域的随机访问记忆(RAM)政策这一艰巨任务。