In recent years, imitation learning (IL) has been widely used in industry as the core of autonomous vehicle (AV) planning modules. However, previous IL works show sample inefficiency and low generalisation in safety-critical scenarios, on which they are rarely tested. As a result, IL planners can reach a performance plateau where adding more training data ceases to improve the learnt policy. First, our work presents an IL model using the spline coefficient parameterisation and offline expert queries to enhance safety and training efficiency. Then, we expose the weakness of the learnt IL policy by synthetically generating critical scenarios through optimisation of parameters of the driver's risk field (DRF), a parametric human driving behaviour model implemented in a multi-agent traffic simulator based on the Lyft Prediction Dataset. To continuously improve the learnt policy, we retrain the IL model with augmented data. Thanks to the expressivity and interpretability of the DRF, the desired driving behaviours can be encoded and aggregated to the original training data. Our work constitutes a full development cycle that can efficiently and continuously improve the learnt IL policies in closed-loop. Finally, we show that our IL planner developed with less training resource still has superior performance compared to the previous state-of-the-art.
翻译:近年来,模仿学习(IL)作为自动驾驶(AV)规划模块的核心广泛应用于工业领域。然而,先前的IL研究表明,它们在安全关键环境下显示出样本低效和泛化性能不佳,在这种环境下它们很少被测试。结果,IL计划者可以达到一个性能平台,增加更多的训练数据不能改进学习策略。首先,我们的工作提出了使用样条参数化和离线专家查询来增强安全和训练效率的IL模型。然后,我们通过优化Lyft Prediction数据集上基于多智能体交通模拟器实现的驾驶者风险场(DRF)的参数来合成关键场景,揭示了学到的IL策略的弱点。为了持续改进学习策略,我们使用增强数据重新训练IL模型。由于DRF的表达能力和可解释性,所需的驾驶行为可以被编码并在原始训练数据上聚合。我们的工作构成了一种完整的开发周期,可以有效地在闭环中不断改进学习IL策略。最后,我们展示了与以前的最新技术相比,使用更少的训练资源开发的IL计划程序仍具有优越性能。