This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert). We present a provably efficient model-based framework MobILE to solve the ILFO problem. MobILE involves carefully trading off strategic exploration against imitation - this is achieved by integrating the idea of optimism in the face of uncertainty into the distribution matching imitation learning (IL) framework. We provide a unified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of structural complexity. We also show that the ILFO problem is strictly harder than the standard IL problem by presenting an exponential sample complexity separation between IL and ILFO. We complement these theoretical results with experimental simulations on benchmark OpenAI Gym tasks that indicate the efficacy of MobILE. Code for implementing the MobILE framework is available at https://github.com/rahulkidambi/MobILE-NeurIPS2021.
翻译:这份论文研究《光从观察中吸取教训》(ILFO),向学习者展示了专家演示,仅由专家访问的国家(无法接触专家采取的行动)组成。我们提出了一个非常高效的模型框架MobILE,以解决ILFO问题。MobILE通过将面对不确定性的乐观观点纳入分布匹配学习框架(ILFO)来谨慎地交换战略探索与仿造。我们为MobILE提供了统一分析,并表明MobILE对于满足某些经过良好研究的结构复杂性概念的MDP动态类别享有强大的性能保障。我们还通过显示ILIL和ILFO之间的指数样本复杂性分离,表明ILFO问题比标准ILFO问题要严重得多。我们对这些理论结果加以补充,在OpenAI Gym基准任务上进行实验模拟,表明MOBILE的功效。执行MobIE框架的守则见https://github.com/rahulkidamb/MobILE-NeurIPS2021。