在连接的驾驶环境中利用深强化学习 (Traffic Shaping and Hysteresis Mitigation Using Deep Reinforcement Learning in a Connected Driving Environment)

A multi-agent deep reinforcement learning-based framework for traffic shaping. The proposed framework offers a key advantage over existing congestion management strategies which is the ability to mitigate hysteresis phenomena. Unlike existing congestion management strategies that focus on breakdown prevention, the proposed framework is extremely effective after breakdown formation. The proposed framework assumes partial connectivity between the automated vehicles which share information. The framework requires a basic level of autonomy defined by one-dimensional longitudinal control. This framework is primarily built using a centralized training, centralized execution multi-agent deep reinforcement learning approach, where longitudinal control is defined by signals of acceleration or deceleration commands which are then executed by all agents uniformly. The model undertaken for training and testing of the framework is based on the well-known Double Deep Q-Learning algorithm which takes the average state of flow within the traffic stream as the model input and outputs actions in the form of acceleration or deceleration values. We demonstrate the ability of the model to shape the state of traffic, mitigate the negative effects of hysteresis, and even improve traffic flow beyond its original level. This paper also identifies the minimum percentage of CAVs required to successfully shape the traffic under an assumption of uniformly distributed CAVs within the loop system. The framework illustrated in this work doesnt just show the theoretical applicability of reinforcement learning to tackle such challenges but also proposes a realistic solution that only requires partial connectivity and continuous monitoring of the average speed of the system, which can be achieved using readily available sensors that measure the speeds of vehicles in reasonable proximity to the CAVs.

翻译：拟议的框架与现有的以故障预防为重点的现有拥堵管理战略不同,在故障形成后,拟议框架极为有效。拟议框架假定共享信息的自动化车辆之间部分连接;框架要求以单维纵向控制形式界定基本的自主程度。这一框架主要使用集中培训、集中执行、多试强化学习方法,通过加速或减速指令的信号确定纵向控制,然后由所有代理人统一执行。为培训和测试框架而采用的模式基于众所周知的双深电离电流平均流动状态的双轨算法,作为以加速或减速值为形式的模式投入和产出行动。我们展示模型有能力形成交通状况,减轻歇斯底里(hysteris)的负面效应,甚至改善交通流动,使其超出原有水平。本文还确定了快速加速或减速指令的信号,随后由所有代理人统一执行。为培训和测试框架的培训和测试模式基于众所周知的双深电解算算法,该模式以交通流中的平均流动状态作为加速或减速值模式行动。我们展示模型能够形成交通状况,减轻歇性流动的最小百分比,同时在可快速分析的系统内提出可持续推进的升级的系统。