Existing data-driven and feedback traffic control strategies do not consider the heterogeneity of real-time data measurements. Besides, traditional reinforcement learning (RL) methods for traffic control usually converge slowly for lacking data efficiency. Moreover, conventional optimal perimeter control schemes require exact knowledge of the system dynamics and thus would be fragile to endogenous uncertainties. To handle these challenges, this work proposes an integral reinforcement learning (IRL) based approach to learning the macroscopic traffic dynamics for adaptive optimal perimeter control. This work makes the following primary contributions to the transportation literature: (a) A continuous-time control is developed with discrete gain updates to adapt to the discrete-time sensor data. (b) To reduce the sampling complexity and use the available data more efficiently, the experience replay (ER) technique is introduced to the IRL algorithm. (c) The proposed method relaxes the requirement on model calibration in a "model-free" manner that enables robustness against modeling uncertainty and enhances the real-time performance via a data-driven RL algorithm. (d) The convergence of the IRL-based algorithms and the stability of the controlled traffic dynamics are proven via the Lyapunov theory. The optimal control law is parameterized and then approximated by neural networks (NN), which moderates the computational complexity. Both state and input constraints are considered while no model linearization is required. Numerical examples and simulation experiments are presented to verify the effectiveness and efficiency of the proposed method.
翻译:现有数据驱动和反馈交通控制战略没有考虑到实时数据测量的异质性;此外,传统的交通控制强化学习(RL)方法通常因缺乏数据效率而缓慢交汇,缺乏数据效率;此外,常规最佳周边控制方案需要系统动态的精确知识,因此对内在不确定性很脆弱;为应对这些挑战,这项工作建议采用基于综合强化学习(IRL)方法,学习宏观交通动态,以适应最佳周边控制。这项工作对运输文献作出了以下主要贡献:(a) 开发连续时间控制,对离散时间传感器数据进行更新,以适应离散时间更新;(b) 为降低取样复杂性并更有效地使用现有数据,将经验重放(ER)技术引入IRL算法。 (c) 拟议方法以“无模式”的方式放松模型校准要求,以便能够抵御模型不确定性的稳健,并通过中度RLL算法进行实时操作;(d) 以离散时间为基础的算法制算法和定式交通动态模型的稳定性,通过Lyapunov 系统化的模型化和定序模型的模型化,然后通过Lyapoprological 25化的模型化法验证法的模型化,可以证明。