This paper develops a decentralized reinforcement learning (RL) scheme for multi-intersection adaptive traffic signal control (TSC), called "CVLight", that leverages data collected from connected vehicles (CVs). The state and reward design facilitates coordination among agents and considers travel delays collected by CVs. A novel algorithm, Asymmetric Advantage Actor-critic (Asym-A2C), is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to execute optimal signal timing. Comprehensive experiments show the superiority of CVLight over state-of-the-art algorithms under a 2-by-2 synthetic road network with various traffic demand patterns and penetration rates. The learned policy is then visualized to further demonstrate the advantage of Asym-A2C. A pre-train technique is applied to improve the scalability of CVLight, which significantly shortens the training time and shows the advantage in performance under a 5-by-5 road network. A case study is performed on a 2-by-2 road network located in State College, Pennsylvania, USA, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and achieve the best performance, especially under low CV penetration rates.
翻译:本文为多部门适应性交通信号控制(称为“CVLight”)制定了一个分散化强化学习(RL)计划(TSC),称为“CVLight”,利用从相关车辆收集的数据。 州和奖励设计促进代理之间的协调,并审议CVs收集的旅行延误。 提出了一种新型算法,即Asymtym advantage Actor-critic (Asym-A2C),在使用CV 和非CV 信息来培训批评网络的情况下,使用CV 信息来实施最佳信号时间; 全面实验显示CVLight优于在2比2的合成公路网络下收集的先进性能算法,具有各种交通需求模式和渗透率。 然后,将所学政策视觉化,以进一步展示Asym-A2C的优势。 应用了一种前科技技术来提高CVL的可缩略性,这大大缩短了培训时间,并显示在5比5公路网络下的业绩优势。 一项全面实验显示CV在2比2的公路网络上优于2比2的高级高级综合算,在州立大学、经过培训的Sirlxlxlxlxxxxxylxxxxxxxxxxxxxxxxylxxylxylxylxylxxxxxylx的进度中,进一步展示法中, 。