Traffic simulation has gained a lot of interest for quantitative evaluation of self driving vehicles performance. In order for a simulator to be a valuable test bench, it is required that the driving policy animating each traffic agent in the scene acts as humans would do while maintaining minimal safety guarantees. Learning the driving policies of traffic agents from recorded human driving data or through reinforcement learning seems to be an attractive solution for the generation of realistic and highly interactive traffic situations in uncontrolled intersections or roundabouts. In this work, we show that a trade-off exists between imitating human driving and maintaining safety when learning driving policies. We do this by comparing how various Imitation learning and Reinforcement learning algorithms perform when applied to the driving task. We also propose a multi objective learning algorithm (MOPPO) that improves both objectives together. We test our driving policies on highly interactive driving scenarios extracted from INTERACTION Dataset to evaluate how human-like they behave.
翻译:交通模拟在自我驾驶车辆性能的定量评估方面引起了很大的兴趣。为了让模拟器成为有价值的测试台,需要将现场每个交通代理器的驱动政策作为人来操作,同时保持最低限度的安全保障。从载人驾驶记录数据或通过强化学习学习交通代理器的驱动政策似乎是在不受控制的交叉点或环形地带产生现实和高度互动的交通状况的一个有吸引力的解决方案。在这项工作中,我们表明在学习驾驶政策时模仿人驾驶与维护安全之间存在着权衡。我们这样做的方法是比较各种模拟学习和强化学习算法在应用驾驶任务时是如何表现的。我们还提出了一个多目标学习算法(MOPPO),可以共同改善这两个目标。我们用从InterACtive Dataset提取的高度互动的驾驶方案测试我们的驾驶政策,以评价他们的行为方式。