We consider offline Imitation Learning from corrupted demonstrations where a constant fraction of data can be noise or even arbitrary outliers. Classical approaches such as Behavior Cloning assumes that demonstrations are collected by an presumably optimal expert, hence may fail drastically when learning from corrupted demonstrations. We propose a novel robust algorithm by minimizing a Median-of-Means (MOM) objective which guarantees the accurate estimation of policy, even in the presence of constant fraction of outliers. Our theoretical analysis shows that our robust method in the corrupted setting enjoys nearly the same error scaling and sample complexity guarantees as the classical Behavior Cloning in the expert demonstration setting. Our experiments on continuous-control benchmarks validate that our method exhibits the predicted robustness and effectiveness, and achieves competitive results compared to existing imitation learning methods.
翻译:我们考虑从腐败的示威中脱线的学习,因为腐败的数据的固定部分可以是噪音,甚至任意的离线。行为克隆等古老方法假定示威是由假定的最佳专家收集的,因此在从腐败的示威中学习时可能会大失所望。我们提出一种新的稳健的算法,将中中值目标最小化,保证准确估计政策,即使存在常数的离线者。我们的理论分析表明,我们腐败环境中的稳健方法拥有与专家示范环境中的典型的Behavior克隆几乎相同的误差比例和样本复杂性保障。我们关于持续控制基准的实验证实,我们的方法展示了预测的稳健性和有效性,并取得了与现有模仿学习方法相比的竞争结果。