互动自主驾驶的模拟政策改进 (Iterative Imitation Policy Improvement for Interactive Autonomous Driving)

We propose an imitation learning system for autonomous driving in urban traffic with interactions. We train a Behavioral Cloning~(BC) policy to imitate driving behavior collected from the real urban traffic, and apply the data aggregation algorithm to improve its performance iteratively. Applying data aggregation in this setting comes with two challenges. The first challenge is that it is expensive and dangerous to collect online rollout data in the real urban traffic. Creating similar traffic scenarios in simulator like CARLA for online rollout collection can also be difficult. Instead, we propose to create a weak simulator from the training dataset, in which all the surrounding vehicles follow the data trajectory provided by the dataset. We find that the collected online data in such a simulator can still be used to improve BC policy's performance. The second challenge is the tedious and time-consuming process of human labelling process during online rollout. To solve this problem, we use an A$^*$ planner as a pseudo-expert to provide expert-like demonstration. We validate our proposed imitation learning system in the real urban traffic scenarios. The experimental results show that our system can significantly improve the performance of baseline BC policy.

翻译：我们提出城市交通中自主驾驶的模拟学习系统。我们培训行为克隆~ (BC) 政策, 以模仿从实际城市交通中收集的驱动行为, 并应用数据汇总算法来迭接性地改进其性能。在这种环境下应用数据汇总有两个挑战。第一个挑战是在实际城市交通中收集在线推出数据既昂贵又危险。在模拟器(如 CARLA ) 中创建类似的在线推出收集活动方案也可能很困难。相反, 我们提议从培训数据集中创建一个薄弱的模拟器, 让周围所有车辆都遵循数据集提供的数据轨迹。我们发现,在这种模拟器中收集的在线数据仍可用于改进 BC 政策性能。第二个挑战是在网上推出期间收集人类标签过程的冗长和耗时性进程。为了解决这个问题, 我们用一个A$$$$@$ planner作为模拟专家来提供专家式演示。我们验证了我们在真实的城市交通流量假设中拟议的模拟学习系统。我们的实验结果显示, 我们的系统可以大幅改进 BCBC 的基线。