This paper develops a reinforcement learning (RL) scheme for adaptive traffic signal control (ATSC), called "CVLight", that leverages data collected only from connected vehicles (CV). Seven types of RL models are proposed within this scheme that contain various state and reward representations, including incorporation of CV delay and green light duration into state and the usage of CV delay as reward. To further incorporate information of both CV and non-CV into CVLight, an algorithm based on actor-critic, A2C-Full, is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to update the policy network and execute optimal signal timing. These models are compared at an isolated intersection under various CV market penetration rates. A full model with the best performance (i.e., minimum average travel delay per vehicle) is then selected and applied to compare with state-of-the-art benchmarks under different levels of traffic demands, turning proportions, and dynamic traffic demands, respectively. Two case studies are performed on an isolated intersection and a corridor with three consecutive intersections located in Manhattan, New York, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models that use all vehicle information, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and can achieve a similar or even greater performance when the CV penetration rate is no less than 20%.
翻译:本文为适应性交通信号控制开发了一个强化学习(RL)计划,称为“CVLight ”,利用只从相关车辆收集的数据。在这一计划内,提议了七种RL模式,其中包括各种状态和奖励表,包括将CV延迟和绿灯期限纳入状态,以及使用CV延迟作为奖励。为了进一步将CV和非CV的信息纳入CVLight,提议了一种基于CV和非CV信息的算法,即A2C-Interial-Full,其中CV和非CV信息用于培训批评网络,而CV信息仅用于更新政策网络和执行最佳信号时间。这些模式在CV市场渗透率下,以孤立的交叉点和最佳信号时间比较,包括将CV的延迟和绿色灯光期限纳入状态,以及使用CV的延迟时间,然后选择了一种最佳性(即每辆车最低平均旅行延迟时间)的全面模式,并用于比较不同交通需求水平下的最新标准,A2C-C-Instrual-Fluserv 分别用于一个孤立的交叉点和走廊,甚至三个连续交叉点的C-com交叉点,可以显示所有C-traxxx的进度。