We demonstrate the first large-scale application of model-based generative adversarial imitation learning (MGAIL) to the task of dense urban self-driving. We augment standard MGAIL using a hierarchical model to enable generalization to arbitrary goal routes, and measure performance using a closed-loop evaluation framework with simulated interactive agents. We train policies from expert trajectories collected from real vehicles driving over 100,000 miles in San Francisco, and demonstrate a steerable policy that can navigate robustly even in a zero-shot setting, generalizing to synthetic scenarios with novel goals that never occurred in real-world driving. We also demonstrate the importance of mixing closed-loop MGAIL losses with open-loop behavior cloning losses, and show our best policy approaches the performance of the expert. We evaluate our imitative model in both average and challenging scenarios, and show how it can serve as a useful prior to plan successful trajectories.
翻译:我们展示了首次大规模应用基于模型的基因对抗模拟学习(MGAIL)来完成密集的城市自我驾驶任务。我们用一个等级模型来提升标准MGAIL,以便能够向任意的目标路径推广,用模拟互动剂来测量使用闭路评估框架的性能。我们用在旧金山驾车超过10万英里的实际车辆所收集的专家轨迹来培训政策,并展示了一种可以稳健地在零点位置飞行的可控政策,将合成情景概括为在现实世界中从未出现过的新目标。我们还展示了将闭路移动MGAIL损失与露路行为克隆损失相结合的重要性,并展示了我们最佳的政策方法,即专家的表现。我们从平均和富有挑战性的角度评价了我们的模拟模型,并展示了它如何在规划成功的轨迹之前起到有益作用。