Imitation learning aims to mimic the behavior of experts without explicit reward signals. Passive imitation learning methods which use static expert datasets typically suffer from compounding error, low sample efficiency, and high hyper-parameter sensitivity. In contrast, active imitation learning methods solicit expert interventions to address the limitations. However, recent active imitation learning methods are designed based on human intuitions or empirical experience without theoretical guarantee. In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process. By solving the optimization objective of this framework, we propose a practical implementation, naming it AdapMen. Theoretical analysis shows that AdapMen can improve the error bound and avoid compounding error under mild conditions. Experiments on the MetaDrive benchmark and Atari 2600 games validate our theoretical analysis and show that our method achieves near-expert performance with much less expert involvement and total sampling steps than previous methods. The code is available at https://github.com/liuxhym/AdapMen.
翻译:利用静态专家数据集的被动模仿学习方法通常会受到复合错误、低抽样效率以及高超参数敏感性的影响。相反,积极的模仿学习方法会寻求专家干预,以解决局限性问题。然而,最近的主动模仿学习方法是根据人类直觉或经验经验设计的,没有理论保证。在本文中,我们提议了一个以教师与学生互动模式为基础的新颖的积极模仿学习框架,教师的目标是确定最佳教学行为并积极影响学生的学习过程。通过解决这一框架的优化目标,我们建议实际实施,命名为AdapMen。理论分析表明,AdapMen可以在温和的条件下改进错误的束缚和避免复合错误。MetaDrive基准实验和Atari 2600游戏验证了我们的理论分析,并表明我们的方法在接近专家性的表现方面比以前的方法少得多。代码可以在 https://github.com/lichym/AdapM.上查到。</s>