Imitation learning enables robots to learn from demonstrations. Previous imitation learning algorithms usually assume access to optimal expert demonstrations. However, in many real-world applications, this assumption is limiting. Most collected demonstrations are not optimal or are produced by an agent with slightly different dynamics. We therefore address the problem of imitation learning when the demonstrations can be sub-optimal or be drawn from agents with varying dynamics. We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning. The proposed score enables learning from more informative demonstrations, and disregarding the less relevant demonstrations. Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
翻译:模拟学习使机器人能够从演示中学习。 先前的模仿学习算法通常假定可以使用最佳的专家演示。 但是,在许多真实世界的应用中,这一假设是有限的。 收集的示范大多不是最佳的,或是由动态略有不同的代理人制作的。 因此,我们解决了模拟学习问题,当演示可能不尽人意或来自动态不同的代理人时。 我们开发了一个由可行性评分和最佳评分组成的衡量标准,以衡量演示对模拟学习的有用性。 提议的评分使得能够从更多的信息演示中学习,而忽视不那么相关的演示。 我们在模拟中的四个环境和真正的机器人的实验显示,以更高的预期回报来改进了学习的政策。