Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. While many of our findings confirm common practices, some of them are surprising or even contradict prior work. In particular, our results suggest that artificial demonstrations are not a good proxy for human data and that the very common practice of evaluating imitation algorithms only with synthetic demonstrations may lead to algorithms which perform poorly in the more realistic scenarios with human demonstrations.
翻译:多年来,为了提高所学政策的表现以及算法的抽样复杂性,提出了多种不同组成部分,以提升所学政策的表现,实际上,这些选择很少在严格的实证研究中一起测试,因此很难讨论和理解在高层次算法选项和低层次实施细节中,哪些选择是有用的,因此很难讨论和理解。为了解决这一问题,我们在一个通用的对立模拟学习框架中实施其中的50多项选择,并在一项大规模研究( > 500k 受过训练的代理)中调查这些选择的影响,同时进行合成和人为的演示。虽然我们的许多调查结果证实了共同的做法,但其中一些是令人吃惊的,甚至与先前的工作相矛盾。特别是,我们的结果表明,人工示范不是人类数据的好替代物,而且仅用合成演示来评价仿造算法的非常常见的做法可能导致在更现实的情景中与人类演示不善的算法。