In recent years, imitation learning (IL) has been widely used in industry as the core of autonomous vehicle (AV) planning modules. However, previous work on IL planners shows sample inefficiency and low generalisation in safety-critical scenarios, on which they are rarely tested. As a result, IL planners can reach a performance plateau where adding more training data ceases to improve the learnt policy. First, our work presents an IL model using the spline coefficient parameterisation and offline expert queries to enhance safety and training efficiency. Then, we expose the weakness of the learnt IL policy by synthetically generating critical scenarios through optimisation of parameters of the driver's risk field (DRF), a parametric human driving behaviour model implemented in a multi-agent traffic simulator based on the Lyft Prediction Dataset. To continuously improve the learnt policy, we retrain the IL model with augmented data. Thanks to the expressivity and interpretability of the DRF, the desired driving behaviours can be encoded and aggregated to the original training data. Our work constitutes a full development cycle that can efficiently and continuously improve the learnt IL policies in closed-loop. Finally, we show that our IL planner developed with 30 times less training resource still has superior performance compared to the previous state-of-the-art.
翻译:近年来,模仿学习(IL)在工业中被广泛用作自主工具规划模块的核心;然而,IL规划者以往的工作显示,安全临界情景中,安全关键情景的抽样效率低下和一般化程度低,很少对其进行测试;因此,IL规划者可以达到一个性能高地,增加更多的培训数据,从而停止改进所学政策。首先,我们的工作展示了一个使用样板系数参数参数参数和离线专家查询的IL模型,以提高安全和培训效率。然后,我们通过优化驾驶员风险领域的参数,合成地生成关键情景,暴露了所学的IL政策的弱点。这是在基于Lyft预测数据集的多试剂交通模拟器中实施的准人驾驶行为模型。为了不断改进所学的政策,我们用强化的数据重新配置IL模型。由于DRF的清晰度和可解释性,我们所期望的驱动行为可以与原始的培训数据进行编码和汇总。我们的工作构成一个完整的发展周期,可以有效和持续地改进所学的IL风险领域(DRF)政策。最后,我们用关闭的高级资源计划显示,我们仍能与关闭30的高级业绩比较。