Data-driven simulators promise high data-efficiency for driving policy learning. When used for modelling interactions, this data-efficiency becomes a bottleneck: Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving. We address this challenge by proposing a simulation method that uses in-painted ado vehicles for learning robust driving policies. Thus, our approach can be used to learn policies that involve multi-agent interactions and allows for training via state-of-the-art policy learning methods. We evaluate the approach for learning standard interaction scenarios in driving. In extensive experiments, our work demonstrates that the resulting policies can be directly transferred to a full-scale autonomous vehicle without making use of any traditional sim-to-real transfer techniques such as domain randomization.
翻译:数据驱动模拟器为驱动政策学习提供了高数据效率。当用于模拟互动时,这种数据效率会成为一个瓶颈:小型基础数据集往往缺乏有趣和具有挑战性的边际案例来学习互动驾驶。我们通过提出一种模拟方法来应对这一挑战,该模拟方法使用油漆中的阿多车来学习稳健的驾驶政策。因此,我们的方法可以用来学习涉及多剂互动的政策,并允许通过最先进的政策学习方法进行培训。我们评估了学习标准驾驶互动情景的方法。在广泛的实验中,我们的工作表明,由此产生的政策可以直接转移到一个完全自主的飞行器上,而不使用任何传统的模拟到现实的转让技术,如域随机化。