Imitation learning trains control policies by mimicking pre-recorded expert demonstrations. In partially observable settings, imitation policies must rely on observation histories, but many seemingly paradoxical results show better performance for policies that only access the most recent observation. Recent solutions ranging from causal graph learning to deep information bottlenecks have shown promising results, but failed to scale to realistic settings such as visual imitation. We propose a solution that outperforms these prior approaches by upweighting demonstration keyframes corresponding to expert action changepoints. This simple approach easily scales to complex visual imitation settings. Our experimental results demonstrate consistent performance improvements over all baselines on image-based Gym MuJoCo continuous control tasks. Finally, on the CARLA photorealistic vision-based urban driving simulator, we resolve a long-standing issue in behavioral cloning for driving by demonstrating effective imitation from observation histories. Supplementary materials and code at: \url{https://tinyurl.com/imitation-keyframes}.
翻译:模拟预先记录的专家演示, 模拟模拟学习的模拟学习训练控制政策。 在部分可观察的环境中, 仿照政策必须依赖观察历史, 但许多看似自相矛盾的结果显示, 只能进行最新观察的政策表现更好。 从因果图形学习到深层信息瓶颈等最新解决方案都显示了令人乐观的结果, 但未能推广到视觉仿真等现实环境。 我们提出了一个优于这些先前方法的解决方案, 其方法是提升与专家行动变化点相对应的示范关键框架的重量。 这种简单的方法很容易适用于复杂的视觉仿真设置。 我们的实验结果显示, 在基于图像的 Gym MuJoCo 连续控制任务的所有基线上, 都持续地改善了绩效。 最后, 在 CARLA 光现实主义城市驱动模拟器上, 我们通过展示观察历史的有效仿真来解决行为克隆问题。 补充材料和代码 :\url{https://tinyurl.com/ imitation- keyframes} 。