Deep deterministic policy gradient (DDPG)-based car-following strategy can break through the constraints of the differential equation model due to the ability of exploration on complex environments. However, the car-following performance of DDPG is usually degraded by unreasonable reward function design, insufficient training, and low sampling efficiency. In order to solve this kind of problem, a hybrid car-following strategy based on DDPG and cooperative adaptive cruise control (CACC) is proposed. First, the car-following process is modeled as the Markov decision process to calculate CACC and DDPG simultaneously at each frame. Given a current state, two actions are obtained from CACC and DDPG, respectively. Then, an optimal action, corresponding to the one offering a larger reward, is chosen as the output of the hybrid strategy. Meanwhile, a rule is designed to ensure that the change rate of acceleration is smaller than the desired value. Therefore, the proposed strategy not only guarantees the basic performance of car-following through CACC but also makes full use of the advantages of exploration on complex environments via DDPG. Finally, simulation results show that the car-following performance of the proposed strategy is improved compared with that of DDPG and CACC.
翻译:深度确定性政策梯度(DDPG)基于汽车执行战略的深度确定性政策梯度(DPG)可打破因在复杂环境中进行勘探的能力而导致差异方程模型的制约;然而,由于不合理的奖励功能设计、培训不足和取样效率低,DDPG的汽车跟踪业绩通常会因不合理的奖励功能设计、培训不足和取样效率低而退化;为了解决这类问题,提出了基于DDPG和合作性适应性游轮控制(CACC)的混合汽车跟踪战略;首先,汽车跟踪进程建模为Markov决定过程,以在每个框架同时计算CACC和DDPG的计算速度。鉴于目前的状况,CACC和DDPG分别采取了两项行动。随后,选择了与提供更大奖励的一个行动相对应的最佳行动作为混合战略的产出。与此同时,制定了一条规则,以确保加速率的变化速度低于预期值。因此,拟议的战略不仅保证通过CACC进行汽车跟踪的基本业绩,而且还充分利用通过DDPG在复杂环境中进行勘探的优势。最后,模拟结果表明,与DDPG相比,采取汽车跟踪战略的业绩是改进的。