This study proposes a delay-compensated feedback controller based on proximal policy optimization (PPO) reinforcement learning to stabilize traffic flow in the congested regime by manipulating the time-gap of adaptive cruise control-equipped (ACC-equipped) vehicles.The traffic dynamics on a freeway segment are governed by an Aw-Rascle-Zhang (ARZ) model, consisting of $2\times 2$ nonlinear first-order partial differential equations (PDEs).Inspired by the backstepping delay compensator [18] but different from whose complex segmented control scheme, the PPO control is composed of three feedbacks, namely the current traffic flow velocity, the current traffic flow density and previous one step control input. The control gains for the three feedbacks are learned from the interaction between the PPO and the numerical simulator of the traffic system without knowing the system dynamics. Numerical simulation experiments are designed to compare the Lyapunov control, the backstepping control and the PPO control. The results show that for a delay-free system, the PPO control has faster convergence rate and less control effort than the Lyapunov control. For a traffic system with input delay, the performance of the PPO controller is comparable to that of the Backstepping controller, even for the situation that the delay value does not match. However, the PPO is robust to parameter perturbations, while the Backstepping controller cannot stabilize a system where one of the parameters is disturbed by Gaussian noise.
翻译:这项研究建议根据准度政策优化(PPO)强化学习,通过操纵适应性巡航控制设备(ACC-装备)车辆的时间间隔来稳定拥挤状态的交通流量。 高速公路段的交通动态由Aw-Rascle-Zhang(ARZ)模型管理,该模型由2美元和非线性一级一级部分偏差方程组成。 受后步延迟校准器[18]的启发,但不同于其复杂的分段控制计划,PPPO控制由三种反馈组成,即当前交通流量速度、当前交通流量密度和以前的一步控制输入。 3个反馈的控制收益来自PPO与交通系统数字模拟器(ARZ)之间的相互作用,而不知道系统动态。 数字模拟实验旨在比较Lyapunov控制、后步控制以及PPPPO控制。结果显示,对于延迟系统来说,PPPPO控制,即使是更快的趋同率,而LyPO的递增率则无法比Leptrol控制。