Deep Reinforcement Learning (DRL) is regarded as a potential method for car-following control and has been mostly studied to support a single following vehicle. However, it is more challenging to learn a stable and efficient car-following policy when there are multiple following vehicles in a platoon, especially with unpredictable leading vehicle behavior. In this context, we adopt an integrated DRL and Dynamic Programming (DP) approach to learn autonomous platoon control policies, which embeds the Deep Deterministic Policy Gradient (DDPG) algorithm into a finite-horizon value iteration framework. Although the DP framework can improve the stability and performance of DDPG, it has the limitations of lower sampling and training efficiency. In this paper, we propose an algorithm, namely Finite-Horizon-DDPG with Sweeping through reduced state space using Stationary approximation (FH-DDPG-SS), which uses three key ideas to overcome the above limitations, i.e., transferring network weights backward in time, stationary policy approximation for earlier time steps, and sweeping through reduced state space. In order to verify the effectiveness of FH-DDPG-SS, simulation using real driving data is performed, where the performance of FH-DDPG-SS is compared with those of the benchmark algorithms. Finally, platoon safety and string stability for FH-DDPG-SS are demonstrated.
翻译:深加学习(DRL)被认为是一种潜在的汽车跟踪控制方法,主要是为了支持另一辆车,但当排里有多个车辆,特别是车辆行为不可预测,在排里有多个车辆时,学习稳定和高效的汽车跟踪政策更具有挑战性;在这方面,我们采用综合的DRL和动态编程(DP)方法,学习自主排控制政策,将深威慑政策分级算法(DDPG)纳入一个有限偏差值迭代框架。虽然DP框架可以提高DDPG的稳定性和性能,但具有较低的取样和培训效率。我们在此文件中建议一种算法,即Finite-Horizon-DPG,使用固定近差(FH-DG-DPS-SS)缩小空间空间,通过缩小空间进行清理,采用三个关键概念来克服上述限制,即将网络重量向后向后移,为较早的时序式政策近近,并通过缩小国家空间进行清理。为了核实FH-DPG-SS-DG-SS的级级级安全性,我们建议一种算算法。