Robust learning from noisy demonstrations is a practical but highly challenging problem in imitation learning. In this paper, we first theoretically show that robust imitation learning can be achieved by optimizing a classification risk with a symmetric loss. Based on this theoretical finding, we then propose a new imitation learning method that optimizes the classification risk by effectively combining pseudo-labeling with co-training. Unlike existing methods, our method does not require additional labels or strict assumptions about noise distributions. Experimental results on continuous-control benchmarks show that our method is more robust compared to state-of-the-art methods.
翻译:从吵闹的示威中大力学习是模仿学习中一个实际但极具挑战性的问题。 在本文中,我们首先从理论上表明,通过优化分类风险和对称损失,可以实现强健的模拟学习。 根据这一理论发现,我们然后提出一种新的模仿学习方法,通过将假标签与联合培训有效结合,优化分类风险。与现有方法不同,我们的方法不需要额外的标签或严格的噪音分布假设。 连续控制基准的实验结果表明,我们的方法比最先进的方法更加健全。